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Abstract 

Fully  polarized  Xpaich  signatures  are  transformed  to  two  left  circularly  polarized  signals.  These 
two  signals  are  then  filtered  by  a  linear  FM  pulse  compression  (‘chirp’)  transfer  function,  corrupted  by 
AWGN,  and  filtered  by  a  filter  matched  to  the  ‘chirp’  transfer  function.  The  bandwidth  of  the  ‘chirp’ 
radar  is  about  7S0  MHz.  Range  profile  feature  extraction  is  performed  using  the  TLS  Prony  Model 
parameter  estimation  technique  developed  at  Ohio  State  University.  Using  the  Prony  Model,  each 
scattering  center  is  described  by  a  polarization  ellipse,  relative  energy,  frequency  response,  and  range. 
This  representation  of  the  target  is  vector  quantized  using  a  K-means  clustering  algorithm.  Sequences 
of  vector  quantized  scattering  centers  as  well  as  sequences  of  vector  quantized  range  profiles  are  used 
to  synthesize  target  specific  Hidden  Markov  Models  (HMM’s).  The  identification  decision  is  made  by 
determining  which  HMM  has  the  highest  probability  of  generating  the  unknown  sequence.  The  data 
consist  of  synthesized  Xpatch  signatures  of  two  targets  which  have  been  difficult  to  separate  with  other 
RTI  algorithms.  The  RTI  algorithm  developed  for  this  thesis  is  clearly  able  to  separate  these  two  targets 
over  a  10  by  10  degree  (1  degree  granularity)  aspect  angle  window  off  the  nose  for  SNRs  as  low  as  0 
dB.  The  classification  rate  is  100  %  for  SNRs  of  5  -  20  dB,  95  %  for  a  SNR  of  0  dB  and  it  drops  rapidly 
for  SNRs  lower  than  0  dB. 


HIGH  RANGE  RESOLUTION  RADAR  TARGET  IDENTIHCATION 


USING  THE  PRONY  MODEL  AND 
HIDDEN  MARKOV  MODELS 


I.  INTRODUCTION 


1.1  Background 

In  air-to-air  engagements  between  high-performance  fighter  aircraft,  positive  and  timely  iden¬ 
tification  of  targets  is  critical.  In  the  absence  of  a  cooperative  identification  system  and  less  than 
adequate  intelligence,  target  identification  must  be  deduced  from  backscattered  electromagnetic  energy 
(30).  Taigets  without  cooperative  identification  systems,  such  as  Identification  Friend  or  Foe  (IFF) 
transponders,  are  called  non-cooperative  targets. 

The  backscattered  or  emitted  electromagnetic  energy  from  a  non-cooperative  target  of  interest 
can  be  detected  by  a  number  of  systems  depending  on  the  characteristics  of  the  signal.  If  the  target 
is  within  visual  range,  the  backscattered  electromagnetic  energy  (light  in  this  case)  can  be  sensed  by 
the  pilot’s  eyes  and  processed  by  the  most  powerful  identification  system  known  -  the  human  visual 
system.  However,  if  the  target  is  at  a  relatively  long  range,  the  backscattered  electromagnetic  energy 
must  be  measured  electronically.  Radar  is  well  suited  for  this  task.  The  target  is  illuminated  with  a 
well  defined  signal  and  the  backscattered  electromagnetic  energy  (radar  return)  is  then  detected  by  a 
receiver  which  is  optimized  to  process  the  radar  return.  Non-cooperative  target  recognition  (NCTR) 
with  radar  is  designated  Radar  Target  Identification  (RTI). 

Historically,  radar  has  been  used  to  estimate  the  relative  range,  direction,  and  velocity  of  targets 
of  interest  (26).  Although  these  parameters  give  significant  information  about  the  target,  they  cannot 
-  by  themselves  -  be  usr  J  to  identify  the  object.  Unlike  the  conventional  radars  which  are  used  to 
estimate  relative  position  <id  velocity.  High  Range  Resolution  (HRR)  radar  is  specifically  designed  to 
resolve  the  target  in  range  along  the  radar-target  vector.  Range  resolution  is  defined  as  the  minimum 
distance  between  two  point  targets  at  which  they  can  discerned  as  two  distinct  targets  (26).  In  other 
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words,  if  the  range  resolution  is  one  meter;  point  targets  separated  by  less  than  a  meter  will  ^pear  as 
one  taiget.  In  general,  the  range  profile  is  created  by  dividing  the  return  into  range  bins  the  size  of  the 
range  resolution  of  the  radar. 

As  stated  previously,  the  radar  illuminates  the  target  and  receives  the  backscattered  signal  to 
yield  the  range  profile,  location,  and  velocity  of  the  target.  The  range  profile  is  sent  to  a  feature 
extraction  process  which  outputs  a  set  of  descriptive  parameters  which  concisely  describe  the  taiget. 
The  approximate  target/radar  aspect  angle  (the  direction  at  which  the  taiget  is  being  illuminated)  is 
estimated  from  the  location  and  velocity  of  the  taiget  via  the  tracking  process.  Finally,  the  decision 
process  chooses  the  taiget  identification  based  on  the  range  profile  feature  set  and  knowledge  of  the 
target/radar  aspect  angle. 

1.2  Problem  Statement 

This  thesis  will  develop  a  reliable  RTI  algorithm  which  uses  a  real  aperture  (as  opposed  to 
synthetic  aperture)  HRR  radar  as  a  sensor.  This  RTI  algorithm  will  be  designed  to  perform  on¬ 
board  identification  during  air-to-air  engagements  between  fighter  aircraft  as  depicted  by  Ingure  1.1. 
For  realistic  operational  scenarios,  the  RTI  algorithm  must  perform  the  identification  process  with  a 
minimum  number  of  measurements  for  near-real-time  taiget  identification. 

1.3  Summary  of  Current  Knowledge 

Research  in  the  area  of  Radar  Taiget  Identification  (RTI)  spans  a  broad  range  of  radar  types 
and  classification  algorithms.  Libby  presents  a  conprehensive  summary  of  the  stait-of-art  of  RTI 
(17).  What  follows  is  an  overview  of  the  research  areas  which  are  most  applicable  to  RTI  algorithms 
developed  for  this  thesis. 

1.3.1  Variability  of  HRR  Radar  Profiles  and  Some  Implications  for  Target  Recognition  Cohen 
(4)  provides  some  valuable  insight  as  to  the  general  nature  of  the  narrow  band  radar  cross  section  (RCS) 
of  complex  targets  such  as  aircraft  and  how  the  RCS  relates  to  the  wideband  HRR  radar  range  profiles 
of  these  targets.  To  quote  Skolnik,  "the  RCS  of  a  target  is  the  (fictional)  area  intercepting  that  amount 
of  power  which,  when  scattered  equally  in  all  directions,  produces  an  echo  at  the  radar  equal  to  that 
from  the  taiget  (26).”  That  is  to  say,  if  a  taiget  reflects  the  same  energy  back  to  the  radar  as  a  perfectly 
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Figure  l.I.  Operational  Scenario 


conducting  sphere  (which  scatters  isotropically)  whose  cross  section  is  1  m^,  the  RCS  of  that  target  is 
Im*. 

The  narrowband  RCS  of  an  aircraft  or  any  other  complex  target  is  extremely  dependent  on  aspect 
angle  and  therefore,  cannot  be  reported  as  a  single  quantity.  Thus,  RCS  is  usually  reported  for  a  single 
frequency  in  a  polar  plot  as  a  function  aspect  angle  (4)  (26).  If  the  RCS  is  modeled  as  a  random  process, 
it  is  not  uncommon  to  find  that  it  can  completely  decorrelate  every  0.2°  change  in  aspect  angle.  Because 
of  this  sensitivity  to  aspect  angle,  Cohen  states  that  RTI  algorithms  may  require  inordinate  antounts  of 
both  data  for  training  and  memory  for  implementation  (4). 
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Cohen  showed  that  the  the  energy  contained  within  the  range  profile  changes  with  the  same 
variability  as  the  narrowband  RCS,  but  general  shape  of  the  wideband  range  profile  is  less  susceptible 
to  changes  in  aspect  than  the  RCS.  However,  the  shape  of  the  range  profile  is  still  very  sensitive  to 
changes  in  aspect  angle.  Therefore,  it  may  be  advantageous,  firom  an  RTI  perspective,  to  normalize  the 
energy  of  the  range  profiles  to  prevent  their  amplitudes  from  changing  as  the  RCS  changes.  However, 
the  loss  of  information  which  is  contained  in  the  absolute  RCS  and  how  it  changes  with  aspect  angle 
may  negate  the  net  improvement  gained  by  this  normalization  (4). 

1.3.2  Recognition  of  Targets  Using  Sequences  of  Range  Profiles  and  Knowledge  of  Aspect 
Angle  Libby,  who  is  currently  conducting  RTI  research  at  AFTT,  has  shown  that  the  aspect  angle  of 
an  aircraft  in  stable  flight  can  be  estimated  to  a  high  degree  using  the  information  available  from  a 
tracking  system  (16).  In  this  light,  he  suggests  that  if  the  aspect  angle  is  known,  target  identification 
will  be  improved  if  the  classifier  excludes  possible  matches  to  targets  whose  templates  are  known  to 
be  outside  the  tolerance  of  the  aspect  angle  estimate.  More  importantly,  Libby  also  states  that  if  the 
classifier  makes  use  of  the  temporal  relationships  between  range  profiles  in  a  sequence,  coupled  with 
the  knowledge  of  the  aspect  angle  of  the  target  during  that  sequence,  the  classification  performance 
will  be  better  than  identification  based  on  individual  range  profiles  treated  as  independent  events.  Also, 
Libby  suggests  that  Hidden  Markov  Models,  which  have  been  used  for  speech  processing,  may  be 
useful  for  processing  these  temporally  related  range  profile  sequences  (17). 

1.3.3  General  Dynamics  RTI  Research  General  Dynamics  (GD)  conducted  RTI  research  using 
HRR  under  contract  with  the  USAFs  Wright  Laboratory  (I )  (28).  The  algorithm  developed  by  GD  was 
based  on  a  "slide  distance  metric”  between  range  profiles  from  a  known  target  and  range  profiles  from 
an  unknown  target.  Range  profiles  were  gathered  for  an  actual  target  at  discrete  angles  and  processed 
to  extract  the  significant  peaks  along  the  radar/target  vector.  This  reduced  data  set  consisted  of  the 
most  prominent  peaks  and  was  stored  as  a  target  template  from  which  the  unknown  range  profiles  were 
matched.  The  "slide  distance  metric”  was  calculated  by  aligning  the  two  range  sweeps  in  each  of  four 
different  ways  -  aligning  the  first  peak  of  the  unknown  peak  with  the  first  peak  of  the  template  sweep, 
aligning  the  second  peak  of  the  unknown  sweep  with  the  second  peak  of  the  template  sweep,  and  so 
on  through  a  fourth  peak-to-fourth  peak  match.  This  peak-to-peak  comparison  was  performed  for  all 
the  stored  target  templates  and  the  target  identification  corresponded  to  the  minimum  sUde  distance. 
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Depending  on  the  complexity  of  the  target,  several  templates  at  different  aspect  angles  must  be  available 
for  each  taiget. 

1.3.4  Target  Identification  Using  Polarization-Diverse  Features  Chamberlain  used  the  fully 
polarized  signatures  to  generate  the  transient  polarization  response  (TPR)  of  the  target  (2).  The  TPR  is 
generated  by  illuminating  the  target  with  an  impulse  of  a  circularly  polarized  wave.  As  the  wave  interacts 
with  each  scattering  center  on  the  target,  each  scattering  center  reflects  a  wave  with  a  polarization  that 
is  determined  by  the  polarimetric  characteristics  of  that  scattering  center.  The  individual  scattering 
centers  were  estimated  nonparametrically  using  the  Inverse  Fast  Fourier  Transform  (IFFT)  of  the 
discrete  or  stepped  frequency  response  of  the  target.  Chamberlain  found  that  the  polarization  derived 
from  the  TPR  related  well  with  the  shape  and  orientation  of  the  major  scattering  centers  distributed 
in  the  downrange  profile  of  the  target.  The  TPR  features  of  five  conunercial  aircraft  range  profiles 
were  classified  using  K-nearest  neighbor  (KNN)  technique.  At  a  single  aspect  angle,  the  correct  target 
identification  was  made  97%  of  the  time  for  signal  to  noise  ratios  (SNR)  as  low  0  dB. 

1.3.5  High  Resolution  Exponential  Modeling  of  Fully  Polarized  Radar  Returns  In  Chamber¬ 
lain’s  woric,  nonparametric  techniques  for  extracting  the  scattering  centers  was  developed.  Sacchini 
and  Steedly  (2S)  (27)  developed  a  parametric  method  to  locate  scattering  centers  which  was  based  on 
the  Prony  Model  (23).  Like  the  research  of  Chamberlain,  each  scattering  center  is  characterized  by 
a  polarization  ellipse  which  corresponds  to  the  back  scattered  polarization  from  a  circularly  polarized 
wave.  Specifically,  the  IFFT  method  and  the  Prony  method  were  compared  (25).  Except  for  conq>uta- 
tional  cost,  the  performance  of  the  Prony  model  was  superior  to  the  conventional  IFFT  techniques.  In 
fact,  the  Prony  model  estimated  the  polarization  ellipses  of  the  dominant  scattering  centers  of  various 
aircraft  with  reasonable  accuracy  at  0  dB  SNR. 

1.3.6  Automatic  Target  Identification  Based  on  Models  of  Neural  Networks  There  are  a  number 
examples  where  neural  networks  have  been  used  for  RTl.  Farhat  (8)  trained  a  neural  netwoilc  with 
sinograms  of  scale  models  of  a  B-S2,  AWACS,  and  the  Space  Shuttle.  A  sinogram  is  a  C!artesian  plot  of 
the  measured  relative  range  of  scattering  centers  on  the  object  versus  aspect  angle.  Faihat  found  that  the 
neural  network  was  able  to  correctly  identify  the  taiget  with  as  little  as  10%  of  the  sinogram.  Although 
the  findings  are  significant,  they  may  be  of  limited  value  because  the  sinogram  used  for  training  and 
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identification  was  generated  by  targets  rotating  smoothly  through  a  range  of  azimuth  angles  at  a  constant 
elevation  angle  about  the  plane  of  the  wings. 

1.3.7  Hidden  Markov  Models  Traditionally,  Hidden  Markov  Models  (HMM’s)  have  been  used 
for  isolated  word  recognition  (IWR)  (11)  (12)  (13)  (IS)  (20)  (21)  (22) .  For  this  purpose,  words  to  be 
identified  are  modeled  using  a  HMM  -  one  HMM  per  word  in  IWR  applications-  and  the  correct  word 
is  then  chosen  by  determining  which  model  had  the  highest  probability  of  generating  the  unknown 
word.  HMM’s  have  been  used  for  IWR  with  excellent  results  when  compared  to  other  techniques  (12) 
(21)  (22) .  It  !q>pears  that  HMM's,  and  in  particular  left-right  HMM’s,  are  well  suited  for  HRR  RTI 
because  the  temporal  relationship  of  scattering  centers  are  inherently  represented  by  the  HMM  state 
transitions  (22). 

1.4  Scope 

Previous  work  will  be  drawn  upon  for  all  phases  of  the  RTI  process  development  The  basic 
theory  of  HRR  radar  has  been  developed  for  over  30  years  (5)  and  equations  derived  firom  this 
knowledge  base  will  be  implemented  in  software.  However,  no  attempt  will  be  made  to  emulate  any 
specific  HRR  radar  that  has  been  implemented  in  hardware.  Also,  the  High  Resolution  Exponential 
Modeling  of  stepped  frequency  range  profiles  developed  by  Sacchini  and  Steedly  (25)  (27)  will  be  be 
used  for  feature  extraction  with  little  iiKxlification.  Therefore,  the  primary  focus  of  this  research  will 
be  on  the  actual  classification  algorithms.  The  RTI  system  developed  for  this  thesis  will  be  designed  to 
identify  two  fighter  class  aircraft  over  the  limited  aspect  angle  window  depicted  by  Figure  1.1. 

1.5  Assumptions 

The  HRR  RTI  problem  is  further  defined  and  scaled  by  the  following  assurrqrtions: 

•  The  tracking  system  can  estimate  the  aspect  angle  to  within  a  10°  by  10°  azimuth  and  elevation 

window  in  the  aspect  angle  space  as  shown  in  Hgure  l.l  (16). 

•  The  range  profile  is  corrupted  by  additive  white  Gaussian  noise  and  is  free  of  other  types  of  noise 


associated  with  clutter  (26). 


•  The  operating  frequency  of  the  radar  is  approximately  10  GHz  so  that  an  antenna  with  adequate 
gain  can  be  mounted  on  the  aircraft  (9). 

•  The  radar  is  locked  on  to  the  taiget  so  that  multiple  range  profiles  will  be  available  for  target 
identification  in  a  relatively  short  period  of  time  (16). 

1.6  Approach  and  Methodology 

Software  systems  are  currently  available  which  -  when  given  a  detailed  description  of  a  complex 
target  such  as  an  aircraft  -  can  accurately  predict  the  range  profile  of  the  target  at  an  arbitrary  aspect 
angle  (3).  However,  the  reverse  process  by  which  the  detailed  structure  of  a  con^lex  target  is 
determined  at  an  arbitrary  aspect  angle  from  its  range  profile  cannot  be  reliably  performed.  The  reverse 
process  is  difficult  to  perform  because  scatterers  at  a  given  range  have  an  arbitrary  phase  so  that  they 
reinforce  or  cancel  stochastically.  Also,  scatterers  may  be  illuminated  by  more  than  one  signal  path 
as  energy  reflects  from  one  scatterer  to  another  before  returning  to  the  radar.  Both  of  the  processes 
described  above  are  very  dependent  on  aspect  angle.  In  general,  processes  which  are  too  complex  to 
be  represented  by  a  deterministic  model  must  be  modeled  as  random  phenomenon  (6).  Therefore,  the 
RTI  system  developed  in  this  thesis  will  be  based  on  the  stochastic  properties  of  the  range  profiles  and 
how  the  range  profiles  change  with  aspect  angle. 

1.6.1  Range  Profile  Feature  Extraction  and  Vector  Quantization  Range  profiles  will  be  pro¬ 
cessed  using  the  Prony  Technique  developed  by  Steedly  and  Sacchini  (25)  (27).  Using  this  techiuque 
each  scattering  center  is  described  by  a  polarization  ellipse,  relative  energy,  frequency  response  and 
range.  The  data  is  reduced  by  vector  quantizing  to  clustering  centers.  The  clustering  centers  will  be 
chosen  using  a  K-means  clustering  algorithrtL 

1.6.2  Target  Identification  In  general,  tai^et  identification  will  made  based  on  the  tenoral 
relationships  between  scattering  centers  of  single  range  profiles  as  well  as  the  temporal  relationships 
between  sequences  of  range  profiles.  One  HMM  per  class  will  be  used  to  nnodel  single  range  profiles 
over  the  entire  aspect  angle  window  of  interest.  Identification  of  unknown  targets  will  be  realized 
by  determining  the  HMM  which  has  the  highest  probability  of  generating  the  unknown  target’s  range 
profile.  If  identification  is  made  based  on  sequences  of  range  profiles,  multiple  HMM’s  per  class  will 


be  used  and  the  identification  decision  will  be  reached  according  to  which  group  of  HMM’s  has  the 
highest  probability  of  generating  the  sequence  of  range  profiles. 

1.6.3  HRR  Radar  Signature  Synthesis  Actual  HRR  radar  signatures  are  not  readily  available. 
Therefore,  synthesized  HRR  radar  returns  will  be  used  for  this  thesis.  A  number  of  software  packages 
that  generate  HRR  radar  range  profiles  are  available.  One  such  package  is  Xpatch  which  calculates 
the  frequency-domain  RCS  of  a  faceted  target  by  geometric  optical  ray  casting  with  multiple  bounce 
technique  (3).  Range  profiles  produced  by  Xpatch  will  form  the  data  set  for  this  research. 

The  Xpatch  range  profiles  are  fully  polarized,  in  that  four  range  profiles  are  generated  for  each 
signature  calculation:  vertical  transmit/vertical  receive;  vertical  transmit/horizontal  receive;  horizontal 
transmit^orizontal  receive;  and  horizontal  transmit^orizontal  receive.  The  four  range  profiles  are 
represented  in  the  ftequency  domain  and  consist  of  1 024 discrete  frequency  samples  ftom  approximately 
9.2S  to  10.75  GHz.  In  the  time-domain,  the  range  profiles  are  divided  into  0. 1  meter  range  bins  covering 
a  total  of  102.4  meters  per  range  profile.  The  incident  plane  wave  used  to  calculate  the  signatures  in 
Xpatch  is  non-causal  and  therefore,  is  not  physically  realizable  in  hardware(18). 

In  other  words,  the  signature  generated  by  Xpatch  is  essentially  the  in^ulse  response  of  the 
target  over  the  l.S  GHz  bandwidth.  Therefore,  more  realistic  signatures  can  be  obtained  by  adding 
white  Gaussian  noise  to  the  signatures  and  convolving  them  with  a  transfer  function  of  a  causal  HRR 
radar  such  as  a  Linear  Frequency  Modulated  pulse  compression  (26)  HRR  radar  whose  bandwidth  is 
less  then  1.5  GHz. 

1.7  Research  Objectives 

This  thesis  will  investigate  means  to  exploit  HRR  radar  range  profiles  for  the  purpose  of  target 
identification.  The  focus  of  the  research  will  be  on  an  efficient  and  reliable  means  to  represent  a  given 
target  using  Prony’s  Model  and  HMMs.  The  specific  objectives  of  this  research  effort  are  follows; 

•  To  develop  a  reliable  RTI  algorithm  which  is  designed  to  identify  targets  using  a  Linear  FM 
pulse  compression  HRR  radar  as  a  sensor. 

•  To  characterize  the  RTI  algorithm  by  measuring  its  susceptibility  to  additive  white  Gaussian 
noise  (AWGN)  and  range  alignment  errors  caused  by  iinperfections  in  the  tracking  system. 
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1.8  Thesis  Organization 

The  organization  of  this  thesis  is  given  below. 

•  The  detailed  background  material  pertaiiting  to  the  major  con^nents  of  the  RTI  algorithm 
developed  for  this  thesis  is  reviewed  in  Chapter  n. 

•  The  RTI  algorithm  description  is  found  in  Chapter  IH. 

•  Chapter  IV  contains  the  experimental  procedures  and  results. 

•  The  conclusion  and  recommendations  are  given  in  Chapter  V. 

1.9  Summary 

In  the  fast  paced  environment  of  air-to-air  combat,  positive  and  timely  target  identification  is 
critical.  Although  HRR  radar  is  possible  with  current  technology,  RTI  using  HRR  radar  signatures  is 
an  unsolved  problem.  Current  research  has  shown  that  features  based  on  the  polarization  of  the  target’s 
scattering  centers  correspond  well  to  the  physical  structures  of  the  target  Therefore,  these  features 
should  be  useful  for  RTI.  Given  an  accurate  feature  set  and  a  reliable  identification  algorithm,  RTI  in  a 
realistic  combat  environment  will  become  a  reality. 


//.  BACKGROUND 


2.1  Introduction 

Radar  Target  Identification  (RTI)  research  spans  a  broad  range  of  radar  types  and  classification 
algorithms.  This  background  material  presented  in  this  chapter  will  focus  on  RTI  based  on  High  Range 
Resolution  (HRR)  radar  signatures.  The  following  areas  will  be  reviewed: 

•  HRRradar 

•  High  resolution  exponential  modeling  of  fiilly  polarized  radar  returns 

•  Hidden  Markov  Models 

•  Classification  performance  clarification 

2.2  High  Range  Resolution  (HRR)  Radar 

As  stated  in  Chapter  1,  HRR  radar  is  specifically  designed  to  resolve  the  target  into  many  parts 
as  a  function  of  range  along  the  radar-target  vector.  The  range  from  the  radar  to  the  target  along  the 
radar-target  vector  is  determined  by  measuring  the  time  delay  between  the  transmitted  signal  and  the 
received  backscattered  signal  from  the  target  The  range  is  computed  using  the  propagation  velocity 
of  the  transmitted  signal.  Hence,  the  range  is  given  by 


r  =  range:  m 

c  =  speed  of  light  in  free  space:  3  x  10^  m/s 
U  =  tinie  delay;  s. 

The  2  in  the  denominator  accounts  for  the  propagation  of  the  transmitted  signal  to  and  from  the  target. 
If  a  certain  radar  transmits  a  rectangular  pulse  with  a  pulse  width  r  seconds,  the  range  increment  or 
range  resolution  is  Sr  =  cr/2,  and  two  point  reflectors  separated  by  distances  greater  than  Sr  can  be 
resolved  as  two  distinct  targets.  The  bandwidth  S  of  a  r  second  rectangular  pulse  is  approximately 
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B  =  \/t,  and  the  range  resolution  is  usually  expressed  in  terms  of  B  so  that  (26) 


(2.2) 


6r  =  range  resolution 
B  =  bandwidth  of  the  transmitted  signal. 

Obviously,  HRR  can  be  achieved  by  transmitdng  a  veiy  narrow  pulse  (r  =  667  ps  for  6r  =  0.1  m), 
but  the  maximum  range  of  a  narrow  pulse  HRR  radar  is  linuted  by  the  low  energy  of  the  transmitted 
pulse. 

Therefore,  HRR  radar  design  is  constrained  by  two  opposing  parameters:  the  energy  and  band¬ 
width  of  the  transmitted  pulse.  Increasing  the  transmitted  pulse  width  increases  the  energy  but  also 
decreases  the  bandwidth  which  in-tum  decreases  the  range  resolution.  Two  schemes  for  realizing  HRR 
radar  capable  of  detecting  targets  at  ranges  suitable  for  RTI  are  discussed  below. 

2.2.1  Synthetic  HRR  Radar  As  shown  by  Equation  (2.2) ,  6r  is  limited  by  B.  Synthetic  HRR 
radar  realizes  the  necessary  bandwidth  for  HRR  by  measuring  the  steady  state  response  of  the  taiget  at 
discrete  frequencies  over  the  required  bandwidth.  Thus,  the  frequency  domain  response  of  the  taiget  is 
represented  by  a  set  of  scattering  coefficients  which  are  nteasured  at  discrete  frequencies 

S  [(/*  -  /min) A/1  =  S[k],  A:  =  0, 1, 2 . N  -  1  (2.3) 


k  =  integer  index 
N  =  number  of  samples 
/*  =  frequency 

/min  =  minimum  frequency  of  the  sequence 

f,  =  sampling  rate  in  the  discrete  time  domain  (at  least  the  bandpass  Nyquist  rate) 
A/  =  frequency  step  size:  ^ 


the  range  resolution  expressed  in  terms  of  A/  and  N  is 


6r 


c 

2{N  -  l)Af 


(2.4) 


The  range  profile  of  the  target  is  the  inverse  Discrete  Fourier  Transform  (DFT)  of  5[A;],  or 

1 

s[n]  =  n  =  0,1,2,...,  N  -1.  (2.5) 

^  k=0 

In  the  discrete  time  domain,  tj  =  n//«  and  the  discrete  range  is 


cn 


(2.6) 


While  the  classical  definition  of  unambiguous  range  is  related  to  the  pulse  repetition  time  (prt)  or  the 
time  between  transmitted  pulses  of  the  radar,  because  5[n]  is  periodic  with  a  period  of  N,  the  maximum 
unambiguous  range  for  synthetic  HRR  radar  is  limited  to 


_  cN  _  c 

2A7‘ 


(2.7) 


If  a  range  resolution  of  0.1  meters  is  desired,  by  Equation  (2.2)  B  must  be  l.S  GHz  and  f,-  for  the 
bandpass  Nyquist  rate  -  must  also  be  at  least  1.5  GHz.  For  N  =  1024,  the  maximum  unambiguous 
range  Re  is  only  102.4  meters.  If,  for  example,  the  target  is  at  SO  Km,  5[A;]  must  be  measured  at 
the  proper  time  delay.  That  is,  5[A;]  is  windowed  in  time  by  a  window  that  is  centered  on  the  target 
However,  the  range  extent  of  the  target  must  be  less  then  Re  to  prevent  aliasing  in  the  discrete  range 
domain. 

Synthetic  HRR  radar  is  very  useful  for  test  facilities  where  the  target  is  stationary  because 
it  essentially  measures  the  finite  bandwidth  impulse  response  of  the  target  However,  the  utility  of 
synthetic  HRR  radar  for  RTI  of  aircraft  outside  the  laboratory  is  limited  because  the  aircraft  would 
generally  be  at  a  different  location  and  aspect  angle  for  each  frequency  measurement  It  is  possible, 
however,  to  approximate  the  impulse  response  of  the  target  over  a  finite  bandwidth  if  the  transmitted 
signal  is  relatively  wide  pulse  which  is  continuously  swept  across  the  frequency  band  (26).  This  process 
is  described  in  the  following  section. 
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2.2.2  Linear  FM  Pulse  Compression  As  stated  previously,  HRR  design  is  constrained  by 
the  energy  and  bandwidth  of  the  transmitted  pulse.  Pulse  compression  HRR  radars  realize  HRR  by 
modulating  a  relatively  wide,  and  therefore,  high  energy  pulse  with  a  signal  that  has  symmetric,  narrow 
peaked  autocorrelation  properties.  Two  such  examples  of  signals  with  these  autocorrelation  properties 
are  the  Barker  code  and  linear  FM  modulation  waveforms  (26).  What  follows  is  a  description  of  pulse 
compression  using  a  linear  FM  modulated  or  ‘chirped’  transmitted  pulse  employed  by  many  riKxlem 
HRR  radars  (26).  The  derivations  of  the  linear  FM  pulse  compression  equations  stated  below  can  be 
found  in  reference  (5).  Analytically,  the  transmitted  pulse  is 


fit) 


cos 


(2-r/,  [«-!]+/ 


2irB 


fm 


cos  (^(t))rect 


(2.8) 


fc  =  carrier  frequency 
T  =  pulse  width 
Bfm  =  bandwidth 


The  instantaneous  frequency  for  0  <  t  <  r  is  ^  or 


/.„.*(«)  =  0<t<T.  (2.9) 

Therefore,  the  frequency  of  the  pulse  is  linearly  swept  from  (fc  —  Bf„/2)  to  (fc  +  Bfml^).  Taking 
the  Fourier  Transform  of  f(t)  to  represent  the  transmitted  signal  in  the  frequency-domain  gives 


F(f)  = 


5V^{lC(x,)  -1-  C(x,)f  -b  [5(x,)  -b  5(i2)f 


n  irrk 


2-4 


where 


_  2TtBfm 
T 

f  +2^(/e-/) 

L-2nUc-f) 


and  S{x)  and  C{x)  are  Fresnel  integrals 


Cix) 

Six) 


da 

da. 


Recall  that  the  transmitted  signal,  F(/),  has  symmetric,  narrow  peaked  autocorrelation  prop¬ 
erties.  Thus,  filtering  the  received  signal  with  a  filte'*  matched  to  Fif)  compresses  the  the  received 
pulses.  The  matched  filter  output,  which  is  denoted  as  5(/)  =  Fif)F*if),  is  a  san^ling  function 
in  the  discrete  time  domain  whose  first  null  is  at  l/J3/m.  Therefore,  the  range  resolution  is  c/2Bfm 
and  the  transmitted  pulse  appears  to  be  an  unmodulated  rectangular  pulse  which  has  a  pulse  width  of 
r  =  1/B/m  seconds.  Hence,  the  pulse  compression  ratio  is  expressed  as  t/T.  In  general,  a  higher 
pulse  compression  ratio  decreases  the  magnitude  of  U  which  increases  the  sharpness  of  the  roll  off  of 
Fif)  (5). 


2.3  High  Resolution  Exponential  Modeling  of  Fully  Polarized  Radar  Returns 

Scattering  centers  along  the  range  profile  of  HRR  radar  signatures  can  be  modeled  parametrically 
by  an  exponential  model  (25)  (27).  This  model  is  referred  to  as  Prony’s  model.  Lsing  Prony’s  model, 
the  range  profile  of  the  target  is  described  by  T  scattering  centers,  each  characterized  by  |p(,  |  (frequency 
response),  ARt^  (polarization  ellipse  axis  ratio),  (polarization  ellipse  tilt  angle),  Et^  (energy),  and 
ri,  (range).  The  parameter  estimation  techniques  developed  by  Sacchini  and  Steedly  have  overcome 
the  adverse  affects  of  noise  which  usually  limits  the  usefulness  of  the  Prony  model  (25)  (27). 

It  is  assumed  that  the  the  frequency  domain  response  of  the  target  is  represented  by  a  fully 
polarized  set  of  scattering  coefficients  which  are  measured  (for  synthetic  HRR)  or  sampled  (for  Linear 
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FM  Pulse  Compression  HRR)  at  discrete  frequencies  A:  =  0, 1, 2, N  —  1.  These  coefficients  are 
denoted  as  ,  5„h[A;],  5hv[A;],  and  5/,h[A;],  where  the  first  subscript  corresponds  to  the  receive 

polarization  (h  for  horizontal  and  v  for  vertical  polarization)  and  the  second  subscript  corresponds  to  the 
transmit  polarization.  The  four  transmit/receive  polarization  combinations  can  be  used  to  synthesize 
any  arbitrary  polarization  desired.  Left  circular  polarization  is  synthesized  by  the  following  transform 
(29): 

(  ^  '  'l  i.  ,2,1) 

The  subscripts  (hi)  and  (vl)  on  the  left  side  of  Equation  (2.1 1)  correspond  to  a  horizontally  received 
and  vertically  received  left  circularly  polarized  transmitted  signal. 

As  shown  in  Section  2.2.1,  the  range  profile,  Szy[r„]  (xy  is  hi  or  vl),  can  be  found  using 
Equations  ( 2.S)  and  ( 2.6),  and  Equation  ( 2.5)  is  efficiently  confuted  with  a  Fast  Fourier  Transform 
(FFT)  routine.  Except  for  computational  cost,  the  performance  of  the  Prony  model  is  superior  to  the 
conventional  Inverse  Fast  Fourier  Transform  (IFFT)  techniques  because  it  yields  higher  range  resolution 
for  the  same  bandwidth  and  it  models  the  frequency  response  of  individual  scattering  centers.  Using 
the  IFFT  to  calculate  range  profiles,  each  range  resolution  cell  (n  index)  is  modeled  with  a  constant 
frequency  response  over  all  frequencies  A;  =  0, 1, 2, N  ~  1  because  the  kernel  of  (  2.S)  has  a 
magnitude  of  1.  Although  a  perfect  sphere  has  a  constant  frequency  response,  realistic  scattering 
centers  do  not  display  a  constant  frequency  response.  For  example,  the  frequency  response  of  a 
trihedral  increases  linearly  as  frequency  increases  (25).  In  this  section  the  definitions  of  the  Prony 
model  parameters  and  the  procedure  to  estimate  these  parameters  is  presented. 


2.3.  J  Pmny's  Model  The  Prony  model  (23)  of  the  fully  polarized  frequency  data  given  in 
Equation  (2.11)  is 


/  SH,[k] 
\  S„,\k] 


fc  =  0,l,2, 


N-l. 


(2.12) 
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where 


Pi.  =  pole  of  the  n"*  scattering  center  (generally  complex) 

Ohi,  =  amplitude  coefficient  of  the  horizontally  received  scattering  center 
a„t.  =  arr^)iitude  coefficient  of  the  vertically  received  n''*  scattering  center 
T  =  number  of  scattering  centers 
N  =  number  of  discrete  frequencies 

Clearly,  Equation  (2.12)  is  similar  to  Equation  (  2.S).  If  the  n*'*  pole  corresponds  to  an  ideal  point 
scatterer,  |pt.|  =  1,  but  for  more  realistic  scatterers  with  frequency  responses  that  are  not  constant, 
Ipi.I  will  vary  slightly  around  1.  The  relative  range  of  the  n*’'  scattering  center  is  associated  with  the 
angle  of  pt.,  Zpi,,  and  is  computed  as 

0<r,,<  Re.  (2.13) 

Re  is  the  unambiguous  range  given  by  Equation  (  2.7).  The  energy  of  the  scatterer  is  calculated  as 
Eu  =  (KrJ*  +  K.,nEL'o  br.r  n=l,2,...,r  (2.14) 

The  polarization  information  is  contained  in  the  vertical  and  horizontal  amplitude  coefficients, 
a/,1,  and  a„t,.  The  polarization  of  the  n*^  scattering  center  is  an  ellipse  which  can  be  represented 
by  the  Poincar6  Sphere  (14).  The  polarization  ellipse  parameters  are  calculated  with  the  following 
equations: 


7«» 

=  tan  * 

(H- 

0  <  7.,  <  f 

(2.15) 

\ahtj 

2 

= 

- 

— TT  <  ^f,  <  TT 

(2.16) 

1  .  _ 
=  -sm 

[8in(27,, 

,)sin(5,.)],  -J<e,. 

(2.17) 
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A  complete  and  compact  description  of  the  shape  of  the  polarization  ellipse  is  given  by  the  dlt  angle, 
Tt,,  and  the  major  to  minor  axis  ratio  ARt,. 


Tt,  =  \  tan"*  [tan(27,J  008(6,.)] ,  0  <  t,.  <  J 


(2.18) 


ARt,  =  cot(e<,),  — oo  <  ARt^  <  +oo 


(2.19) 


Although  r,.  ranges  from  0  to  tt,  t,.  is  limited  to  ‘itf2  in  (  2.18)  because  only  one  quarter  of  the 
Poincard  Spl^re  is  used.  To  avoid  this  ambiguity,  the  following  alterations  to  r,.  need  to  be  made  (27): 


Tt. 


Tu  +  f  if  It.  >  7 

T<.  +  TT  if  7,.  <  7  and  <  0 


(2.20) 


For  left-handed  polaiizadon  c,.  >  0  and  0  <  <  -foo. 

The  set  of  parameters  ri,;n  =  1,2  -  •  •  ,r}  forms  a  concise  descripdon 

of  each  scattering  center  of  the  target.  The  target  is  described  by  T  such  scattering  centers. 


2.3.2  Parameter  Estimation  Parameter  esdmadon  is  required  to  esdmate  the  poles  and  am¬ 
plitude  coefficients  from  the  frequency  data.  The  following  algorithm  for  esdmadng  the  poles  and 
amplitude  coefficients  was  developed  at  Ohio  State  University  (25)  (27).  The  poles  are  esdmated 
using  a  backward  linear  predicdon  approach  which  uses  a  Singular  Value  Decomposidon  (SVD).  The 
backward  linear  predicdon  equadon  simultaneously  uses  both  the  horizontal  and  vertical  polarizadons 
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and  is  written  as 

^  5m[0] 

5/./11] 


S„,[0] 

5„,[1] 


Shi[1] 

5wl2] 

..  5m[Q1 

Sh,[2] 

Sw[3] 

-1- 1] 

Sh,[N  -  Q] 

54^-Q  +  l)  .. 

..  Sh,[N-1] 

5„,[1] 

5„,[2] 

S„,[Q] 

5„,[2] 

5w[31 

..  S„,[Q  +  1] 

S.i[N  -  Q] 

S„,[N-Q  +  1]  ., 

..  S„,[N-1] 

1 
bx 
62 

\»«y 


0  (2.21) 


or 


1 

b 


0. 


(2.22) 


is  the  number  of  discrete  frequencies,  Q  is  the  prediction  order,  and  b  is  the  coefficient  vector  of  the 
polynomial  B(z)  given  by 


Biz)  =  1  +  6i2  ^  +  b2Z~^  +  . . .  +  bQZ~^ 


(2.23) 


If  Biz)  describes  an  all  pole  filter  in  the  z-transform  domain  that  has  Q  poles.  The  prediction  order, 
Q,  is  ideally  an  integer  greater  than  the  model  order  T.  Prior  to  solving  b,  5  is  expanded  by  a  SVD  so 
that 


5  =  l/SV" 

where  V"  denotes  the  Hermitian  transpose  (or  conjugate  transpose)  of  V  and 


(2.24) 


(  fr.  0  ^ 


E  = 


0\ 


02 


\  0 


(2.25) 


<^Q+l  } 


2-9 


The  singular  values  of  5  are  ordered  form  largest  to  smallest  (  <ti  >  02  >  •  •  •  >  S  is  noise 

cleaned  by  keeping  the  first  T  singular  values  and  setting  the  remaining  singular  values  to  zero  so  that 

f" »  1 


S  = 


Ox 


(2.26) 


Note  that  S  has  the  same  dimensions  as  £.  Now  the  noise  cleaned  version,  S,  is  formed  by 


s  =  utv” 


(2.27) 


Now  S  is  written  as  [5 1  52]  where  5 1  is  a  column  vector  consisting  of  the  first  column  of  S  and 
5  2  is  a  matrix  consisting  of  the  remaining  columns  of  S .  The  estimate  of  6,  b ,  is  then  found  by  using 
a  least  squares  solution,  which  is  written  as 


(2.28) 


Finally,  the  estimated  poles  are  found  by  solving  for  the  roots  of  B{z)  as 


Pi  = 


root,(B(t)) 


,  g  =  l,2,...,Q. 


(2.29) 


Although  there  are  Q  roots  of  B{z),  only  T  of  those  roots  corresponds  to  data  modes  because  only  T 
singular  values  of  S  are  nonzero.  Determining  which  of  the  Q  poles  are  valid  is  accoitq)lished  in  two 
steps.  First,  poles  that  do  not  fit  the  following  criteria  are  discarded. 


2 

100 


—  <  |p,r  <  100. 


(2.30) 
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The  above  criteria  has  been  found  to  model  radar  data  well  (27).  After  the  invalid  poles  have  been 
discarded,  the  T  poles  with  highest  energy  are  retained.  From  Equation  (2.14),  the  energy  calculation 
requites  both  the  horizontal  and  vertical  amplitude  coefficients. 

Writing  Equation  (2.12)  in  matrix  form  gives 


'  p\  pI  ...  '' 

P?  ^ 

...  P^-  > 


a„Q'  j 


^  5m[0]  5„/{0]  ^ 

^  Shi[N  -  1]  S„i[N  -  1]  ^ 


(2.31) 


or 


Pq.Aq.  =  Sa  (2.32) 

Q'  is  the  number  of  poles  which  satisfy  the  criterion  of  Equation  (  2.30).  A  is  found  using  a  least 
square  solution  as 


P  QiSa- 


(2.33) 


The  energy  of  each  of  the  Q'  poles  is  calculated  using  Equation  (2. 14)  and  the  T  poles  with  the  highest 
energy  ate  retained.  The  parameter  estimation  procedure  is  terminated  by  teestimating  the  amplitude 
coefficients  of  the  T  true  poles  such  that 


A 


T 


(2.34) 


2.4  Hidden  Markov  Models 

HMM’s  have  been  explored  by  a  number  of  researchers  since  1975  (19).  Rabiner  (1 1 )  (13)  (20) 
(21X22)  and  Levinson  (IS)  have  written  HMM  tutorials  which  have  been  summarized  in  Voice  and 
Speech  Processing  by  Parsons  (19).  The  purpose  of  this  section  is  to  present  the  general  HMM  theory 
^>plicable  to  RTI.  Most  of  the  HMM  theory  in  this  section  is  drawn  from  Rabiner’s,  A  Tutorial  on 
Hidden  Markov  Models  and  Selected  Applications  in  Speech  Recognition  (22)  and  the  notation  will, 
for  the  most  part,  correspond  to  Rabiner’s  notation  in  that  article. 
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2.4. 1  HMM  Background  Real-world  processes  produce  observable  outputs  which  are  charac¬ 
terized  as  sigiuds.  In  a  broad  sense,  signal  models  can  be  separated  into  two  classes:  deterministic  and 
statistical.  A  sine  wave,  for  example,  is  characterized  by  a  deterministic  signal  model  that  is  completely 
specified  by  amplitude,  frequency,  and  phase;  all  of  which  are  fixed  parameters.  On  the  other  hand, 
statistical  signal  models  characterize  the  signal’s  statistical  properties.  Gaussian  and  Poisson  signal 
models  are  two  well  known  statistical  signal  models  (6).  The  underlying  assumption  of  any  statisti¬ 
cal  model  is  that  the  signal  can  be  well  characterized  as  a  parametric  random  process,  and  that  the 
parameters  of  this  stochastic  process  can  be  determined  (estimated)  in  a  precise,  well-defined  manner 
(22). 

The  HMM  is  also  a  statistical  signal  model  and  is  defined  as  (21), 


.  a  doubly  stochastic  process  with  an  underlying  stochastic  process  that  is  not  observed 
(it’s  hidden),  but  can  only  be  observed  through  another  set  of  stochastic  processes  that 
produce  the  sequence  of  observed  symbols.” 

In  this  section  the  theory  of  Markov  chains  will  be  presented  and  then  extended  to  HMM’s.  After  the 
concepts  of  HMM’s  are  introduced  three  fundamental  problems  for  HMM  design  will  be  discussed  in 
detail. 


2.4.2  Markov  Chains  A  Markov  chain  is  a  system  which  is  described  at  any  time  as  being  in 
one  of  a  set  of  iV  distinct  states,  5i ,  ,  Sat.  The  system  experiences  a  change  of  state  (possibly 

back  to  the  same  state)  at  evenly  spaced  discrete  times  t  =  The  active  state  at  time  t„  is 

denoted  as  qt^.  The  next  state,  9t.+,  >  is  determined  by  the  state  transition  probabilities  associated  with 
the  current  state,  g(„.  The  state  transition  probabilities,  Oij,  are  written  as 

=5,|g..=5.),  l<i,j<N.  (2.35) 

Where  AT  is  the  number  of  states,  5,-  is  the  current  state,  and  5,  is  the  next  state.  The  state  transition 
coefficients  must  satisfy  standard  stochastic  constraints  so  that 

0  <  a.j  <  1  (2.36) 
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Figure  2. 1 .  Three  State  Markov  Chain  to  Model  the  Weather 

N 

=  1.  (2.37) 

r=i 


Each  state  in  the  above  process  corresponds  to  an  observable,  physical  event.  Rabiner  clarifies 
the  concepts  of  the  Markov  chain  by  illustrating  a  simple  3-state  model  of  the  weather  which  is  depicted 
in  Hgure  2.1  (22) .  The  weather  is  observed  at  the  same  time  each  day  as  one  of  the  following: 


State  1: 

rainy  or  (snowy) 

State  2: 

cloudy 

State  3: 

sunny. 
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The  state  transition  probabilities  for  times  tn  >  U  wt  compactly  represented  by  the  >1  matrix  as 
follows 


A  =  (a.j) 


011  “12  Ol3 

O21  O22  023 

^  O3I  O32  O33  j 


(2.38) 


The  initial  state  probabilities  at  time  f  1  are  similarly  represented  with  the  following  notation 


TTi  =  P  (g,,  -  Si),  l<i<  N 


(2.39) 


and  are  also  represented  in  matrix  format  using  the  column  matrix  tt  as  follows; 


2r  =(7ri) 


7r2 

^  ^3  J 


(2.40) 


Now  consider  the  following  weather  observations  over  a  5  day  period,  “rainy  -  sunny  -  cloudy  -  sunny 
-  rainy”.  More  formally  the  observation  sequence  O  is  stated  as  O  =  {81,83,82,83,81}.  The  state 
transitions  are  independent  by  the  Markov  property  (22),  so  the  probability  of  O  given  the  naxlel  is 
expressed  as 


P(OlModel)  =  P(5i,53,52,53,5i|Model) 


(2.41) 


=  P(5,)  •  P(53|5,)  .  P{S2\S3)  •  P(S3l52)  •  P(5,  IS3)  (2.42) 


—  TTi  •  ai3  •  032  •  <*23  ’  <*31 


(2.43) 


Befcxe  extending  these  ideas  to  HMM’s,  it  is  iirqportant  to  understand  that  the  states  of  the 
Maikov  chain  are  explicitly  specified  by  the  observed  weather.  That  is  to  say  that  if  the  weather  is 
observed  as  sunny,  the  model  is  in  state  3  (53).  Therefore,  the  Markov  chain  is  conpletely  visible  from 
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the  observed  sequence.  As  will  be  described  in  the  next  section,  the  states  of  a  HMM  are  not  specified 
by  the  observed  sequence  -  they  are  hidden. 


2.4.3  Extension  to  HMM’s  Rabiner  introduces  the  concepts  of  HMM’s  using  several  biased 
coin  toss  experiments  and  the  classic  urn  and  ball  experiment  (21)  (22).  Instead  of  summarizing 
those  experiments  here,  an  experiment  using  weighted  die  will  be  demonstrated  to  provide  a  broader 
understanding  of  basic  HMM  concepts,  when  combined  with  Rabiner’s  examples. 

Consider  a  system  which  produces  an  integer  number  between  1  and  6  at  evenly  spaced  time 
increments.  A  typical  observed  sequence  might  be 

O  =  Oi^  Ot^  Otj  •  •  •  OtT 
=  34  1  •••2. 

Given,  the  above  set  of  observations,  the  problem  of  interest  is  to  build  a  HMM  to  explain  the  observed 
sequence  of  integers  (22).  Obviously,  there  are  numerous  mechanisms  which  could  account  for  the 
above  observations.  One  of  which  is  a  single,  fair  die,  but  for  this  exaiTq)le,  let  the  system  be  conprised 
of  3  biased  die  which  are  hidden  from  the  observer  by  a  barrier.  Each  die  is  biased  as  follows: 

Diel  Die  2  Die  3 


Pr(l) 

=  0.3 

Pid) 

=  0.05 

P3(l) 

=  0.7 

Pi  (2) 

=  0.15 

P2(2) 

=  0.1 

P3(2) 

=  0.025 

Pi  (3) 

=  0.15 

P2(3) 

=  0.6 

P3(3) 

=  0.1 

Pi  (4) 

=  0.1 

P2(4) 

=  0.05 

P3(4) 

=  0.05 

Pi(5) 

=  0.2 

P2(5) 

=  0.1 

P3(5) 

=  0.025 

Pi  (6) 

=  0.1 

P2(6) 

=  0.1 

P3(6) 

=  0.1 

The  above  weights  were  chosen  arbitrarily,  but  the  combination  of  weights  for  each  die  must  sum  to  1 . 

According  to  some  random  process,  an  initial  die  is  chosen  and  rolled.  The  outcome  of  the  roll  is 
recorded  as  the  first  observation  at  time  U .  A  new  die  is  then  chosen  (possibly  the  same  die)  according 
to  the  random  process  selection  associated  with  the  current  die.  At  the  next  time  increment,  the  new 
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die  is  rolled  and  its  output  is  recorded  as  the  second  observation.  A  new  dice  is  chosen  based  on  the 
current  die  and  the  process  is  repeated. 


In  a  straight  forward  way  the  system  can  be  modeled  with  a  three  state  {N  =  3)  HMM,  one  for 
each  die.  The  initial  state  (the  die  chosen  at  ti)  and  the  state  transitions  (the  new  die  chosen  based 
on  the  current  die)  can  be  represented  as  the  tt  (2.40)  and  A  (2.38)  matrices  introduced  for  Markov 
chains  in  the  previous  section.  For  this  example  let 


TT  =  (tT.)  = 


I  0.4 
0.3 
0.3 


(2.44) 


and 


A  =  (aij)  = 


V 


0.3 

0.5 

0.2 

0.6 

0.2 

0.2 

0.1 

0.2 

0.7 

(2.45) 


Clearly,  the  structure  of  the  HMM  is  similar  to  that  of  the  Markov  chain.  The  primary  difference  between 
the  two  is  that  the  observation  ou^uts  do  not  uniquely  correspond  to  any  given  state.  TherefcHe,  to 
fully  describe  a  HMM  the  observation  probabilities  must  be  specified  for  each  state.  The  following 
notation  is  used 


6,(m)  =  P{vm  at =  Si),  \  <i<  N  I  <m<  M  (2.46) 

Where  is  the  element  of  the  output  symbol  set,  V  =  {ui ,  t;2,  •  •  •  >  },  and  Af  is  the  number 

of  symbols  in  the  output  symbol  set.  For  the  current  example,  Af  =  6  ,  Vj  =  1  ,  V2  =  2,  •  •  •,  and 
v«  =  6.  As  with  A  and  rr  the  b,'(m)’s  are  also  written  as  elements  of  a  S  matrix.  The  probabilistic 
weights  of  the  dice  are  expressed  in  the  B  matrix  as 


B  =(6.(m)) 


0.3 

0.15 

0.15 

0.1 

0.2 

0.1  ’ 

0.05 

0.1 

0.6 

0.05 

0.1 

0.1 

(2.47) 

0.7 

0.025 

0.1 

0.05 

0.025 

0.1  ; 
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The  HMM  is  completely  specified  by  N,  M,  A,  B,  and  ir  and  for  convenience  the  compact  notation 


A  =  (A,B,rr) 


(2.48) 


is  used. 

The  model.  A,  is  generally  used  to  describe  an  observed  sequence  but,  it  can  also  be  used  to 
generate  an  observation  sequence  O  that  has  T  observations  as  follows  (22): 

1.  Choose  an  initial  state  qt^  =  5,  according  to  the  initial  state  distribution  ir. 

2.  Setf  =  ti. 

3.  Choose  =  Um  according  to  the  symbol  probability  distribution  in  state  Si. 

4.  Transition  to  a  new  state  state  =  Sj  according  to  the  state  transition  probability  distribution 
for  state  5,-. 

5.  Set  f  ;  return  to  step  (3)  if  <  <  tj;  otherwise,  terminate  the  procedure. 

This  procedure  was  used  to  generate  3  observation  sequences,  O*’,  with  T  =  10  observations 
each.  The  superscript  k  denotes  the  sequence  number.  The  3  observation  sequences  are 

O*  =  531343355  1 
=  2323346156 
O’  =  336325111  1. 

From  the  perspective  of  the  observer,  who  can  only  see  the  observation  sequences,  the  state  of  the 
model  at  f  as  well  as  the  weights  of  the  individual  dice  ate  not  readily  apparent. 

2.4.4  HMM  Design  As  previously  mentioned  there  ate  three  fundamental  problems  for  HMM 
design.  These  problem  are  given  below  using  the  notation  presented  in  the  previous  sections  (21 )  (22). 

1.  Given  the  observation  sequence  O  =  Ot,  O,,  •  •  •  O,,.,  and  a  nnodel  A  =  {A,  B,  ir).  how  can 
P(0|A),  the  probability  of  the  observation  sequence,  given  the  model,  be  calculated  efficiently? 
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2.  Given  the  observation  sequence  O  =  0|,  0|,  •••  and  a  model  A,  how  can  a  corresponding 

state  sequence  Q  =  qt,  Qtj  •••  qtr  ^  chosen,  which  is  optimal  in  some  meaningful  sense  (i.e., 
best  “explains”  the  observations)? 

3.  How  can  the  model  parameters,  A  =  {A,  B,  ir),  be  chosen  to  maximize  P(0|A)? 

2.4.5  Observation  Probability  Given  the  Model  The  probability  of  an  observation  sequence 
O,  given  the  model  A  (P(0|A))  is  a  measure  of  how  well  a  given  model  matches  an  observation 
sequence.  Therefore,  P(0|  A),  is  a  way  of  choosing  the  best  model  among  several  competing  models 
(21)  (22).  For  isolated  word  recognition  (IWR),  a  HMM  is  synthesized  for  each  word  to  be  identified. 
The  identification  of  an  utterance  of  an  unknown  word  is  accompUshed  by  choosing  the  word  whose 
HMM  has  the  highest  P(0|A). 

P(0|A)  is  computed  in  a  straight  forward  way  by  taking  the  sum  of  the  joint  probabilities 
of  every  state  sequence  Q'  =  9^,  qt^  •••  qt^.  (21)  and  the  observation  sequence  O  of  interest 
P{0,  Q'|A)  can  be  computed  as  the  product  of  P{0\Q‘,  A)  and  P(Q'|A).  It  follows  that 

P(0,  Q'|A)  =  P(0|Q',  A) .  P(Q'IA).  (2.49) 

The  second  term  on  the  right  side  of  (2.49)  is  calculated  exactly  the  same  way  as  P(0|Model)  in 

Equation  (2.41)  for  the  Markov  chain  such  that 

P(Q  |A)  =  (2.50) 

The  calculation  of  P{0\Q',  A)  is  also  straight  forward. 

P(0|g',A)  =  6,.,(0„)6,.,(0.,)--6,.^(0„).  (2.51) 

Hnally,  P(0|A)  is  obtained  by  summing  P(0,  Q'|A)  over  all  possible  state  sequences; 

P(0|A)  =  J]P(0|(?',A)P«?'|A).  (2.52) 

all  ( 
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Although  the  calculation  of  P(0 1  A)  directly  using  (2.S2)  is  straight  forward,  it  is  computationally 
unfeasible  even  for  small  values  of  N  and  T,  since  there  are  (2r  - 1 )  multiplications  per  state  sequence, 
N‘^  possible  state  sequences,  and  —  1  additions  (21).  For  N  =  5  and  T  =  100,  there  are  on 
the  order  of  2  •  100  •  5'®®  «  10’*  computations  required.  However,  the  forward-backward  procedure, 
which  is  discussed  below,  offers  an  efficient  way  of  calculating  P(0|A). 

2.4.6  Forward-Backward  Procedure  Both  the  forward  calculation  and  backward  calculation 
of  the  forward-backward  procedure  avoid  the  computational  problems  of  (2.52)  by  using  an  inductive 
approach  which  averts  the  need  for  calculating  P(0,  Q‘|A)  for  all  possible  state  sequences. 
Derivation  of  the  forward  and  backward  algorithms  are  given  is  Appendix  A. 

2.4.6. 1  Forward  Algorithm  The  forward  variable  is  defined  as 

a,,(i)  =  P  (Ot,  O,,  •  •  •  O,, ,  gt,  =  5,  | A)  (2.53) 

or  the  joint  probability  of  the  partial  observation  sequence  ■  •  •  Ot,  and  being  in  state  5,  at  time 

tn,  given  the  model  A  (21)  (22).  at,{i)  is  evaluated  inductively  as  follows; 

1.  Initialization; 


«<,(*)  =  l<i<N  (2.54) 

2.  Induction;  Given  a,,(i)  for  1  <  i  <  iV  calculate  (J)  by  induction; 

jv 

««.+,  0)  =  bj  ^  a,ia,,(t),  1  <  f  <  iV  (2.55) 

1  <  n  <  r  -  1 

An  illustration  of  the  steps  required  to  compute  the  forward  variable  (j)  is  shown  in  Figure 
2.2. 

3.  Termination;  In  final  step,  the  desired  probability,  P(0|A),  is  computed. 

P(OIA)  =  52  a, ^(i)  (2.56) 

tsi 
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Figure  2.2.  Calculation  of  the  Forward  Variable  (j) 


By  (2.54),  the  number  of  computations  required  to  compute  the  a’s  atiiisN  multiplications,  and 
referring  to  Figure  2.2  and  (2.S5),  each  a  for  t„  >  ti  requires  N  +  1  multiplications  and  N  additions 
for  a  total  of  AT  x  T  —  1  a’s.  With  the  N  additions  in  the  termination  step,  the  number  of  computation 
requited  to  compute  P(0|A),  is  iV  +  (iV  +  1)  (AT)  (T  —  1)  multiplications  and  N  (N)  (T  —  1)  +  N 
additions.  Thus,  the  number  of  computations  required  is  on  the  order  of  N^T  (about  3(XX)  for  A^  =  5 
and  T  =  100)  rather  than  the  2TN^  required  by  the  direct  calculation  (21 ). 

2.4.6.2  Backward  Algorithm  The  backward  algorithm  also  offers  an  efficient  way  of 
computing  P(0|A).  The  derivation  of  the  backward  algorithm  is  also  found  in  Appendix  A.  The 
backward  variable  (i)  is  defined  as 

=  =5,,A).  (2.57) 

Hence,  *e  probability  of  observing  the  partial  observation  sequence  •  •  -O,^, 

given  state  5,-  and  time  t„  and  the  model  A  (21 )  (22).  Again  (*)  is  calculated  inductively  as  follows: 
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1.  Initialization:  Ptri*)  initialized  to  1  for  all  i  to  maintain  the  desired  probability, 
explicitly  defined  for  t„  >  tr,  so  the  following  probability  is  arbitrarily  set  to  1  (22). 

=  =  1,  l<i<N 

2.  Induction:  Compute  inductively  using  (i). 

A,(i)  =  Ef=ia.A(Ou+,)A..,0),  r-l>n>l, 

1  <  t  <  AT. 

The  backward  calculation  is  illustrated  in  Figure  2.3. 

3.  Termination:  The  desired  probability  is  conq>uted  as 

P(0,...0,^1A)  =  f;7r.6.(O.,)/0..(i) 


O  is  not 


(2.58) 


(2.59) 


(2.60) 
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The  number  of  calculations  required  for  the  backward  algorithm  is  approximately  the  same  as 
the  forward  algorithm,  so  the  forward  algorithm  holds  no  clear  advantage  over  the  backward  algorithm. 
However,  both  are  required  for  the  solution  of  Problem  3. 

2.4. 7  Best  State  Sequence  Given  the  Observation  and  the  Model  While  Problem  1  has  an  exact 
solution,  there  are  several  possible  solutions  to  Problem  2  which  depend  on  the  optimality  criterion  for 
the  best  state  sequence  Q  given  the  observation  sequence  O  and  the  nnodel  A.  One  such  optimality 
criterion  is  to  choose  the  states  qt,  which  are  individually  most  likely  at  time  t„  (22).  The  solution  for 
this  criteria  is  acconplished  via  the  following  variable  which  is  defined  as 

7.,(i)  =  =5i|0,A)  (2.61) 


or  the  probability  of  being  in  state  5,-  at  time  f„,  given  the  observation  sequence  O  and  the  model  A. 
7t,  (t)  can  be  written  in  term  of  (t)  and  (t)  as  shown  by  the  derivation  in  Appendix  A.  7«,  (i) 
is  expressed  in  terms  of  at,  (t)  and  (i)  as 


7t.(t)  = 


P(0|A) 

The  individually  most  likely  state  qt,  at  time  t„  is  solved  in  terms  of  jt,  (i)  as 

qt,  =  argmax  [7u(*)]>  l<n<T. 


(2.62) 


(2.63) 


Equation  (2.63)  maximizes  the  expected  number  of  correct  states,  but  the  resulting  state  sequence 
may  not  be  valid.  For  example,  using  (2.63)  it  is  possible  to  choose  qt,  =  S3  and  qt,^^  =  52  to  be 
the  most  likely  states  at  t„  and  t„+i  even  though  032  may  be  0.  Because  the  optimal  state  sequence 
defined  by  (2.63)  may  not  be  a  valid  sequence,  it  is  not  used  for  most  applications  (22).  The  most 
widely  used  optimality  criterion  is  one  which  finds  the  single  best  state  sequence  (i.e.  the  sequence 
with  the  highest  probability,  given  the  observation  and  the  nKxiel)  so  that  P{Q  |0  ,  A  )  is  maximized 
(22).  Maximizing  P{Q  |0  ,  A  )  also  maximizes  P{Q  ,  O  |A  ).  The  Viterbi  algorithm  provides  an 
efficient  technique  for  finding  the  single  best  state  sequence. 
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2.4.8  Viterbi  Algorithm  Using  the  Viterbi  algorithm,  the  single  best  state  sequence  is  found 
via  the  quantity 

^.+,0)  =  max  .  (2.64) 

■  «ln 


Where  (j)  is  the  probability  of  the  best  (or  most  likely)  state  sequence  which  also 

passes  through  Sj  at  tn+i  (or  =  Sj)  for  the  observation  sequence,  Oi,  •  •  •  ,  given  A  . 

Solving  for  (j)  directly  will  eventually  incur  the  same  computational  difficulties  as  calculating 
P{0  |A  )  directly  because  there  are  iV‘*  possible  state  sequences  to  test.  However,  if  (j)  is 
evaluated  recursively  in  terms  of  St,(t)  for  1  <  i  <  N,  the  probability  of  only  N  possible  state 
sequences  need  to  be  scored  because  St,(i),  by  definition,  is  the  probability  of  the  most  likely  state 
sequence  prior  to  t„  which  passes  through  5,-  for  1  <  t  <  iV^.  The  recursive  relationship  between 
0)  and  leads  to  the  following  expression, 

^*.+.0)  =  max  K(*)ao)6;  (2.65) 


The  actual  state  sequence  is  tracked  by  recording  the  argument,  1  <  i  <  AT,  (which  is  actually 
the  previous  state)  that  maximizes  (2.65).  This  value  is  denoted  as  the  variable  V’u.f  1 0)-  procedure 
is  summarized  below  (22); 

1 .  Initialization:  The  initialization  step  at  ti  does  not  require  a  scoring  of  the  previous  state  sequence 
because  the  states  prior  to  ti  are  not  defined.  Therefore,  (i)  =  P  (qt^  =  Si,Ot^  |A  )  only 
depends  ontr ,  B  ,  and  Ot^ .  V**,  (i)  is  not  defined  and  is  arbitrarily  set  to  0.  The  initialization 
step  is  as  follows: 


5..(t)  =  mbiiOt,)  l<i<N 
tl’uii)  =  0  l<t<A^ 


(2.66) 
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2.  Induction:  With  initialization  at  t\  complete,  the  recursive  part  of  the  Viterbi  algorithm  can 
proceed. 

=  max  [6t,(i)aij]bj  ,  2<n<T,  l<i<iV  (2.67) 

i  <^<  N 

V’t.+i (i)  =  argmax  [6t,(t)a<,]  ,  2<n<T,  (268) 

1  <i<  N 

3.  Termination:  Once  the  probabilities  of  the  best  state  sequences  up  to  tr- 1  have  been  determined 
the  probability  of  the  final  best  state  sequence,  denoted  P',  is  simply  the  maximum  for 
I  <  j  <  N  and  the  last  state  of  the  sequence  is  the  argument  of  the  maximum  St^.  (j).  Hence, 

P-  =  max  [6,^(t)]  (2.69) 

I  <i<  N 

qij.  =  argmax  (2-70) 

1  <  •  < 

(2.71) 

4.  State  sequence  retrieval:  The  most  likely  state  sequence,  Q  *,  is  found  by  backtracking  through 
the  ^’s  in  the  following  way: 

9.*.  =  (c,,)  n  =  r-l,r-2,--  l  (2.72) 

2.4.9  HMM Synthesis  through  Training  The  third  problem  which  is  to  adjust  the  parameters  of 
the  HMM  model  A  to  maximize  the  probability  of  the  observation  sequence  given  the  model  does  not 
have  an  analytical  solution  (21)  (22).  The  Baum-Welch  iterative  estimation  procedure  does,  however, 
provide  a  method  which  adjusts  the  model  parameters  so  that  P(0  \X  )  is  locally  maximized.  This 
procedure  is  presented  below. 
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2.4.10  Baum-Welch  Reestimation  Procedure  The  Baum-Welch  leestimation  procedure  uses 
two  quantities.  The  first  quantity,  7^.  (i),  has  been  previously  introduced  and  is  defined  in  (2.61)  and 
is  the  probability  of  being  in  Si  at  t„,  given  O  and  A.  The  second  quantity,  (t,  j),  is  defined  as  the 
probability  of  being  in  5,-  at  t„  and  5j  at  given  O  and  A  such  that  (21)  (22) 

=  P{qt,  =  Si,qt,^,=Si\0,\)  (2.73) 


is  also  be  expressed  in  terms  of  the  forward  and  backward  variables  (a  and  p)  as  shown  in 
Appendix  A.  in  terms  of  a  and  P  is 


— 0(0  ii) 


1  <  i  <  AT,  l<j<N 


(2.74) 


The  definitions  of  ^«,(t,i)  and  7t,(t)  are  very  similar.  In  fact,  summing  j)  over  all  j 
yields  the  probability  of  all  transitions  from  5,-  at  t„  or  equivalently  the  probability  of  being  in  5,-  at  t„ 
(21)  (22).  Therefore, 

N 

(2.75) 

i=t 

Another  quantity  of  interest  is  a  value  which  is  proportional  to  the  expected  number  of  times  that  Si  is 
visited  over  the  length  of  the  observation  sequence.  This  quantity  is  found  by  suiiuning  7(,  (i)  over  all 
time  U  <tn  <  tr  (22).  Furthermore,  if  tr  is  excluded  firom  the  summation,  this  quantity  can  then  be 
interpreted  as  being  proportional  to  the  expected  number  of  transitions  from  5,-  during  the  observation 
sequence.  Similarly,  a  quantity  proportional  to  the  expected  number  of  transitions  from  5,  to  Sj  is  the 
summation  ^(,(i,y)  forti  <t„<  tr- 

The  Baum-Welch  reestimation  procedure  is  based  upon  the  expected  value  of  event  occurrences 
or,  in  other  words,  the  concept  of  counting  event  occurrences.  The  quantities  described  above,  which 
are  actually  probability  measures,  provide  a  way  to  reestimate  the  HMM  probability  parameters. 
Accordingly,  the  Baum-Welch  reestimation  procedure  is  (21X22): 

1.  Update  the  estimates  of  the  initial  state  probabilities,  tt,-,  to  the  probability  proportional  the 
expected  number  of  times  in  Si  at  t^.  The  stochastic  constraint,  ¥,  =  1,  is  satisfied  by 
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normalizing  tt,  by  the  probability  which  is  proportional  to  the  expected  number  of  times  in  ail 
states  at  t\.  7(,  (r)  satisfies  this  relationship  so  that  tt,-  is  estimated  as 


=  7«.(t)  (2'76) 

for  I  <i  <  N. 

2.  Update  the  estimates  of  the  state  transition  probabilities,  a,-j,  to  the  probability  proportional  to 
the  expected  number  of  transitions  from  5,-  to  Sj  for  <  t„  <  tr-i .  The  stochastic  constraint, 
=  1,  is  satisfied  in  this  case  by  normalizing  Oij  by  the  probability  which  is  proportional 
to  the  expected  total  number  of  transitions  from  Si. 


-  ElzUtAiJ) 

(2.78) 

ZlZl  c.umt.ii) 

(2.79) 

for  1  <  t  <  jy  and  1  <  j  <  N. 

3.  Update  the  estimates  of  the  state  symbol  probabilities,  llfi(m),  to  the  probability  proportional 
to  the  expected  number  of  times  in  Si  and  observing  O,,  =  for  ti  <  t„  <  ir-  bi(m)  is 
normalized  by  dividing  by  the  probability  which  proportional  to  the  expected  total  number  of 
times  in  Si. 


bi{m) 


^ - ^ 

colyifOi^  =  vm 


(2.80) 


for  1  <  i  <  TV  and  1  <  m  <  M. 


2.4.11  Properties  of  the  Baum-Welch  Reestimation  Procedure  Baum  has  proven  that  the  rees¬ 
timated  model,  A  =  ^A,B  ,  as  defined  by  Equations  (2.76)  -  (2.80),  is  either  at  a  critical  point, 
in  which  case  A  =  A ,  or  A  is  more  likely  than  A  in  the  sense  that  P{0  |  A  )  >  P(0  |  A  )  (7).  Hence, 
the  Baum-Welch  algorithm  insures  that  P{0  |A  )  >  P{0  |A  ).  By  iteratively  using  A  in  place  of 
A ,  the  probability  of  O  given  the  model  is  improved  until  the  critical  point  is  reached.  The  Baum- 


2-26 


Welch  reestimadon  procedure  leads  to  a  local  maximum  only.  For  most  problems,  the  opdmizadon 
surface  is  very  complex  and  has  many  local  maxima  (21)  (22).  Consequendy,  the  cridcal  point  when 
P(0  |A  )  =  P{0  |A  ),  where  A  represents  the  model  at  the  end  of  the  previous  training  loop,  may 
not  be  the  best  possible  model  and  is  dependent  on  the  stardng  point  of  the  model  before  training  (22). 
Mediods  for  choosing  the  starting  model  are  discussed  in  detail  in  references  (11),  (21),  and  (22). 


2.4.12  Left-Right  HMM  Implementation  For  IWR  a  left-right  HMM  is  usually  implemented 
because  the  temporal  relationship  between  phonemes  is  inherently  modeled  by  the  state  transitions  of 
the  HMM  (21)  (22).  A  left-right  HMM,  which  is  shown  in  Hgure  2.4,  is  a  HMM  which  does  not  allow 
state  transitions  to  a  lesser  state.  Also,  the  initial  state  is  always  Si .  The  initial  state  matrix  ir  and  the 
state  transition  matrix  A  for  a  S  state  left-right  HMM  are 

\ 

(2.81) 

/ 


rr  = 


1.0 

0.0 

0.0 

0.0 

0.0 


and 


an 

Ol2 

“13 

“14 

“15 

0.0 

“22 

“23 

“24 

“25 

0.0 

0.0 

“33 

“34 

“35 

0.0 

0.0 

0.0 

“44 

“45 

0.0 

0.0 

0.0 

0.0 

“55 

(2.82) 


Before  inq)lementing  a  left-right  HMM,  two  problems  must  be  resolved.  The  first  problem  is 
not  unique  to  left-right  HMM’s  and  is  directly  related  to  the  dynamic  range  of  the  computer  used  to 
inclement  the  HMM’s.  To  illustrate  this  problem,  consider  the  forward  calculation  of  at,  (i)  defined 
by  Equation  (2.53).  It  can  be  shown  the  at,(t)  is  calculated  as  the  product  of  a  large  number  of  terms 
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Figure  2.4.  S  State  Left-Right  HMM 


that  are  significantly  less  than  1.  Each  calculation  has  the  form  (22) 

( n  n  K.  • 

\»=i  »=i  / 

The  a(,(t)’s  exponentially  approach  zero  as  increases.  For  greater  than  10,  the  computation  of 
a,,  (t)  may  exceed  the  dynamic  range  of  the  computer  and  for  larger  values  of  t„,  (t)  most  certainly 

will  underflow  any  computer.  To  avoid  the  underflow  problem,  both  the  at,  (t)’s  and  the  (t)’s  are 
calculated  using  a  scaling  procedure  (22).  Using  the  same  reasoning  it  can  be  shown  that  the  Viterbi 
algorithm  will  also  underflow  the  computer  for  approximately  the  same  values  of  t„.  However,  the 
underflow  problem  with  the  Viteibi  algorithm  can  be  effectively  avoided  using  logarithms  (22). 

Secondly,  the  left-right  HMM  cannot  effectively  use  a  single  observation  sequence  to  train 
because  the  transient  nature  of  the  states  within  the  mottel  only  allow  a  small  number  of  observations 
for  any  state  for  a  given  observation  sequence.  Therefore,  in  order  to  have  sufficient  data  for  reliable 
estimates,  multiple  observation  sequences  must  be  used  to  train  the  models  (22).  Consequently, 
Equations  (2.78)  and  (2.80)  must  be  modified  to  account  for  multiple  observations.  The  initial  state 
matrix,  it ,  is  not  changed  because  the  model  always  starts  in  Si.  The  forward-backward  variable 
scaling,  >^terbi  algorithm  with  logarithms  and  multiple  observation  sequence  training  procedure  are 
discussed  below. 

2.4.12.1  Forward-Backward  Variable  Scaling  The  goal  of  the  scaling  procedure  is  to 
keep  the  ai,  (t)  and  /?(,  (i)  coefficients  within  the  dynamic  range  of  the  computer.  The  basic  procedure 
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is  to  multiply  a*,(i)  by  a  scaling  factor,  Ct,,  which  depends  only  on  t„  and  is  independent  of  i.  With 
the  scaling  factor  incorporated  the  forward  calculation  is 

1.  Initialization: 


“!,(*)  =  f>i(Ot.)Tri,  l<i<N 


The  scaling  coefficient  is  calculated  for  ti  as 


ct,  = 


and  (t)  (a^,  (t)  after  scaling)  is 


(2.83) 


(2.84) 


(2.85) 


2.  Induction;  Given  Si,(*)  fori  <i  <  N  calculate  (j)  by  induction; 

(i)  =  ^  o,>S4.(i),  l<j<N 


(2.86) 


The  scaling  coefficient  for  t„+i  is  calculated  as 


^•n  +  l 


(2.87) 


and  finally,  S,,^,  (j)  is 


(2.88) 


This  procedure  is  repeated  recursively  for  t\  <t„<  tr-i- 

3.  Teimination;  P(0  |A  )  cannot  be  calculated  using  0|^(j)  directly  because  the  stochastic  prop¬ 
erties  of  the  forward  variable  calculations  have  been  corrupted  by  scaling.  However,  by  induction 
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it  can  be  shown  that  St„  (i),  at,  (i),  and  Ct,  are  related  as  follows  (22). 


0) 


,0)  = 


(2.89) 


where  Ct,+,  =  nl=V,  Hence,  P(0  jA  )  can  be  calculated  in  ternis  of  Ct-j.-  This  is  shown 
by  writing  ct^  as 


Ef=ibi(Oix-.)Er=i“.A.-.(t) 

_ 1 _ 

c,,..  h  {Otr-.)  Er=t  at^at,_,  (t) 

_ 1 _ 

C'tT_i  I^jLl 


It  follows  directly  from  (2.90)  that 


and  that 


CirE^irU)  =  1 

i=i 


i>(0|A)  =  ^  =  -1 

n«=t, 


(2.90) 


(2.91) 


If  scaling  is  required,  P{0  |  A  )  is  also  likely  to  be  out  of  the  dynamic  range  of  the  computer,  so 
the  log  of  P{0  I A  )  is  normally  confuted  such  that 


log[P(0|A)]  =  -£log[c.] 


(2.92) 


The  (i)  coefficients  are  scaled  by  the  same  scaling  factors  for  each  time  t„  as  was  used  to 
scale  the  at.  (i).  The  backward  recursive  procedure  is  now 

1 .  Initialization:  Ptr  (0  is  initialized  to  Ci^  for  all  i. 


Ptrii)  =  ct^,  l<t<^ 


(2.93) 
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2.  Induction:  Compute /9|,  inductively  using 

A. (*)  =  c,.  aijbj  (Ot.^, )  A.+,  (j),  tr-i 

l<i<N. 

3.  Termination:  P(0  |  A  )  cannot  be  computed  after  this  scaling  procedure  because  in  this  case  the 
scaling  coefficients  are  not  explicidy  related  to  the  unsealed  /?<.  (i)  coefficients.  However,  the 
goal  of  this  scaling  procedure  is  to  keep  ti..  A.(t)  coefficients  within  the  dynamic  range  of  the 
conq[>uter  so  they  can  be  effeedvely  used  for  the  Baum- Welch  reesdmadon  procedure. 

2.4.12.2  Vtterbi  Algorithm  with  Logarithms  For  the  \^terbi  algorithm  the  underflow 
problem  is  solved  without  scaling  by  using  logarithm  addidon  in  place  of  the  muldplicadons  in 
Equadons  (2.66)  and  (2.67).  Thus,  the  Viterbi  algorithm  using  logarithms  is  implemented  as  shown 
below. 

1.  Inidalizadon: 


log(«,.(t))  =  log  (7r0  + log  (6,(0,.))  \<i<N 
V’<i  (*)  =  0  1  <  t  < 

2.  induedon: 


(2.95) 


log(^*.+,(j)) 


•nax  [log  (6,,  (*))  4-  log  (o,j )]  +  log  {bj  (O,.^, ) ) ,  (2.96) 

I  <i<  N 


=  argmax  [log  (6, ,(*))-!- log  (o,;)]  , 

1  <i<  N 


(2.97) 


for  h  <tn  <  It  and  1  <  7  < 
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3.  Termination: 


log(P*)  =  max  [log(6,^(z))]  (2.98) 

I  <i  <  N 

Q'it  =  argmax  [log(6,^(z))] .  (2.99) 

t  <i  <  N 

4.  State  sequence  retrieval: 

9:.  =  (9;.,.)  n  =  r-l,T-2,  ..l  (2.100) 

2.4. 12.3  Training  with  Multiple  Observation  Sequences  and  Scaled  Forward-Backward 
Variables  Recall  that  the  Baum-Welch  reestimation  procedure  maximizes  P{0  ‘"jA  )  where  the  O 
denotes  the  A;''*  observation  sequence.  Note,  that  for  the  reestimation  Equations  (2.78)  and  (2.80),  a 
single  observation  sequence  is  impUed.  For  multiple  observation  sequences,  the  goal  is  to  estimate  the 
model  parameters  to  maximize  P{0  |  A  )  for  the  entire  set  of  K  observation  sequences  denoted  as 

o  =  ...  O*  ...  O*')  (2.101) 

where  the  k***  observation  sequence  is 

0‘  =  0‘  ...  0‘^).  (2.102) 
Assuming  that  the  observation  sequences  are  independent, 

K 

P(0|A)  =  ][[P(0‘|A).  (2.103) 

fc=i 
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However,  the  log  of  P{0  |A  )  is  normally  computed  because  even  for  small  values  of  K  and  T  it  is 
likely  to  be  out  of  the  dynamic  range  of  the  computer.  Thus, 


log[P(0|A)]  =  |^log[p(0*|A)] 

fc=i 


(2.104) 


For  the  remainder  of  this  discussion,  let  P{0  *|A  )  =  P*. 

As  shown  in  Section  2.4.10,  the  estimate  of  the  state  transition  probability,  a.-^,  is  a  probability 
proportional  to  the  expected  number  of  transitions  ftom  S,-  to  Sj  for  ti  <t„  <  tr-x  normalized  by  the 
probability  proportional  to  the  expected  total  number  transitions  from  5,.  For  multiple  obser\ations, 
aij  is  a  probability  proportional  to  the  sum  total  of  the  expected  number  of  transitions  from  5,  to 
Sj  for  ti  <t„  <  tr-i  over  the  entire  set  of  K  observation  sequences  normalized  by  the  probability 
proportional  to  the  expected  total  number  transitions  irom  Si  over  the  K  observation  sequences.  Hence, 
the  modified  reestimation  formula  for  a,j-  is 


ZL  jr  <(iK6,  (Ot,)  (3) 


(2.105) 


for  1  <  i  <  iV  and  1  <  j  <  N.  Similarly,  the  modified  reestimation  formula  for  6,(m)  is 


6,(Tn) 


ZLxi:Lx  7^ 

only  if 


(2.106) 


for  1  <  t  <  AT  and  1  <  m  <  M. 

Each  observation  sequence,  fc,  has  its  own  set  of  a(,(t)  and  coefficients  as  denoted  by 
the  superscript  k  in  Equations  (2.105)  and  (2.106).  The  1/P'‘  factor  in  Equations  (2.105)  and  (2.106) 
is  present  because  it  is  part  of  the  definitions  of  (t)  and  (i,j)  and  does  not  cancel  because  those 
quantities  are  summed  over  fc  in  the  numerator  and  denominator.  P(0  |  A  )  cancels  in  Equations  (2.78) 


and  (2.80).  If  the  (()  and  (i)  coefficients  have  been  scaled,  according  to  Rabiner  the  reestimation 

equations  become 


.  ^  £f..  j.  sa  (o!.„)fiL.  (j> 

eJ..  ^  ^  [3f.  m‘.  (i)] 


(2.107) 


for  1  <  i  <  J\r  and  1  <  j  <  N  and 


bi{m)  = 


' - - - - 

EL,^EL,sf.(iWf.(i) 


(2.108) 


for  1  <  i  <  Af  and  1  <  m  <  M.  The  1/cf^  factor  in  the  denominator  of  Equation  (2.107)  is  required 
because  that  scaling  coefficient  is  absent  from  the  numerator.  The  relationship  between  (t),  (i), 

and  Cf^  is  given  by  Equation  (2.89)  and  a  similar  relationship  also  holds  between  the  scaled  and 
unsealed  (t)  coefficients.  Thus, 


=  (n  ‘'f)  f  J  j  A‘„,  (j) 

=  <??,<(•')<>.)»)  (o,*„.)a*.„0)  (2.109) 


Theiefoie,  the  effect  of  the  scaling  coefficients  can  clearly  be  seen  if  Equations  (2. 107)  and  (2. 108)  are 
written  as 


.  _  eL,  ^  eL','  K,,)  ft*.,,  (j) 


a,,  = 


Ef..5fEl;;^K.(i)A*.(i)) 


for  1  <  t  <  JV  and  1  <  j  <  N  and 


bi(m)  = 


' - - - ' 

onlrifO*^  =  »„ 


(2.110) 


(2.111) 


According  to  Rabiner,  dividing  by  P*  removes  the  scaling  factor  from  each  term  before  summing  over 
k  because  of  the  fact  that  P*  =  1/C,*y(22).  However,  Rabiner  does  not  address  how  Equations  (2.107) 
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and  (2. 108)  would  be  implemented  if  is  out  the  dynamic  range  of  the  computer  -  the  primary  reason 

for  scaling  in  the  first  place.  These  issues  are  addressed  in  Section  3.5.1. 


2.5  Classification  Performance  Clarification 

In  Chapter  IV,  estimates  of  the  classification  error  performance  will  be  reported  for  the  various 
classifiers  at  SNR’s  of  20  dB,  IS  dB,  10  dB,  S  dB,  0  dB,  and  -S  dB.  To  present  these  estimates  in  some 
meaningful  way,  the  SNR  will  be  related  to  range  and  the  accuracy  of  the  estimates  will  be  specified. 
Hence,  the  purpose  of  this  section  is  to  clarify  the  results  of  the  classification  experiments  to  follow. 

2.5.1  Relating  SNR  to  Range  The  SNR  can  easily  be  changed  in  computer  simulations  but,  it 
may  not  be  the  most  useful  quantity  for  expressing  the  overall  performance  of  the  RTI  algorithm  under 
test.  Certainly  from  the  pilot’s  perspective,  a  more  useful  quantity  is  the  range  to  the  target,  which  is 
inversely  proportional  to  the  SNR.  Actually,  the  SNR  for  a  particular  range  is  dependent  on  a  number 
of  variables.  One  such  variable  is  the  radar  cross  section  (RCS)  of  the  target.  However,  the  RCS  is  a 
volatile  quantity  which  is  very  dependent  on  aspect  angle;  changing  by  as  much  as  30  dB  for  just  0.2° 
change  in  aspect  angle  (4).  Obviously,  the  SNR  for  a  given  range  is  also  dependent  on  the  radar  system 
parameters.  In  this  section,  the  radar  range  equation  (RRE)  will  be  used  to  compute  the  range  given 
the  SNR  (26). 

A  hypothetical  HRR  linear  FM  pulse  compression  radar,  whose  transmitted  signal  is  same  as 
described  in  Sections  2.2.2  and  3.2.3,  and  a  hypothetical  target  will  be  used.  The  RRE  for  diis 
hypothetical  radar  and  target  is 

R* 

where  R 

Pav 

G 
A 
a 
k 


Pt,„GAa 


(47r)2fcT„F,(B/„T)/pSNR 
range  from  the  radar  to  the  target,  m 

average  transmitted  power  =  34  w 

antenna  gain  =  10, 000 

antenna  effective  aperture  =  1.5  m^ 

RCS  of  target  =  2  m* 

Boltzmann’s  constant  =  1.38  x  10“*®  J/deg 


(2.112) 
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To  =  standard  temperature  =  290  K 
Fg  =  system  noise  figure  =  2 
Bfm  =  receiver  bandwidth  =  750  MHz 
T  =  pulse  width  =  170.67  ns 
fp  =  pulse  repetition  frequency  =  1000  Hz 
SNR  =  Signal  to  noise  ratio 


Although,  the  parameters  of  the  radar  are  carefully  chosen  so  that  reasonable  ranges  for  a  given  SNR 
can  be  realized,  no  attempt  is  made  to  insure  that  the  radar  can  be  made  with  todays  technology.  For 
example,  an  average  power  of  34  w  for  a  pulse  width  of  170.67  ns  and  a  pulse  repetition  frequency  of 
1000  Hz,  requires  a  peak  power  of  200  kw.  At  the  very  least,  it  would  be  difficult  to  fit  an  aircraft  with 
a  transmitter  this  powerful.  To  field  a  system  with  an  antenna  gain  of  10,000  and  system  noise  figure 
of  2  is  equally  as  difficult.  To  further  complicate  the  issue,  the  RCS  of  the  target  fluctuates  wildly  so 
that  the  range  for  given  the  SNR  will  also  fluctuate.  With  all  diat,  the  range  for  a  given  SNR  is  given 
in  Table  2.1. 


Table  2. 1 .  Range  versus  SNR 


SNR 

20  dB 

ISdB 

10  dB 

SdB 

OdB 

-SdB 

Range 

1S.8  km 

21.1km 

28.2  km 

37.6  km 

SO.l  km 

79.5  km 

2.5.2  Determining  Error  Performance  Confidence  Intervals  The  classification  results  pre¬ 
sented  in  Chapter  FV  are  based  on  a  finite  data  set  and  are,  therefore,  only  point  estimates  of  the  actual 
classification  rate,  E  (24).  Let  the  estimate  of  £7  be  defined  as 


L  =  Number  of  misclassified  test  vectors 
P  =  Number  of  test  vectors 


Assuming  that  E  is  a  binomial  random  variable,  the  Classifier  Confidence  Interval  is  related  to  die 
variance  of  E  and  E  is  tht  mean  of  E  (24).  By  expressing  the  distribution  of  .E  as  the  Gaussian 
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approximation  to  the  binomial  distribution,  the  variance  of  E  is  approximated  by  ^(1  —  E)/P.  It  can 
be  shown  that 


Prob^E  —  za{E}  <  E  <  E  —  za{E}'^  «  7 

(2.113) 

(2.114) 

where  z  and  7  are  parameters  of  the  standard  Gaussian  distribution  of  E. 

Thus,  the  actual  classification  rate  is  within  the  confidence  interval,  E  —  zd{E]  <  E  > 
E  —  za{E},  with  a  confidence  level  or  probability  of  7.  For  example,  if  2  =  1.96  the  confidence 
level  is  95%  (7  =  0.95). 

2.6  Summary 

In  this  chapter,  the  background  material  for  a  linear  FM  pulse  compression  HRR  radar,  the  Prony 
model,  and  HMM’s  has  been  presented.  These  three  disciplines  comprise  the  major  components  of  the 
the  RTI  algorithms  developed  for  this  thesis.  Also,  the  SNR  was  related  to  the  range  firom  the  radar  to 
the  target  and  the  accuracy  of  the  classification  rate  estimates  was  defined  to  clarify  the  classification 
performance  reported  in  Chapter  IV.  In  the  next  chr^ter,  the  RTI  algorithm  will  be  described  by  drawing 
upon  the  background  material  presented  here. 
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Ill  RTI  ALGORITHM  DESCRIPTION 


3.1  Introduction 

The  purpose  of  this  chapter  is  to  describe  the  RTI  algorithms  developed  for  this  research  effort. 
The  background  material  presented  in  Chapter  II  forms  the  basis  for  three  major  sub-processes  of  the 
basic  RTI  algorithtit  These  three  major  sub-processes  are  the  front-end  signal  processing  pnocess, 
the  feature  extraction  process,  and  the  classification  process.  The  system  block  diagram  is  shown  in 
Figure  3.1.  What  follows  will  be  a  detailed  description  of  each  element  within  the  block  diagram. 

3.2  Front-end  Signal  Processing 

The  purpose  of  the  front-end  signal  processing  is  to  condition  the  non-causal,  noiseless  Xpatch 
HRR  radar  signatures  so  that  they  emulate  causal,  and  therefore  physically  realizable,  circularly 
polarized,  HRR  Linear  FM,  noise  corrupted  radar  signatures.  This  process  is  described  below. 


3.2. 1  HRR  Radar  Signature  Synthesis  To  reduce  the  data  storage  requirentents,  the  raw  Xpatch 
output  contains  only  the  positive  (one-sided)  spectrum  of  the  band  limited  impulse  response  of  the  target 
Thus,  in  software,  Xpatch  illuminates  the  target  with  a  complex  signal,  which  has  a  flat  spectrum  over 
the  entire  bandwidth  of  approximately  1 .5  GHz.  In  the  discrete  time  domain,  where  n  is  related  to 
range  r„  by  Equation  (2.6),  the  incident  signal  for  a  single  linear  polarization  is 


Ttn 

fc  W  10.0  GHz 


(3.1) 


B  =  —  =  f,  «  1.5  GHz. 


It  should  be  noted  that  T,  is  not  an  even  multiple  of  /«.  As  a  result,  the  discrete  spectrum  of  x[n],  which 
is  periodic  in  multiples  of  27r,  is  not  centered  around  /«.  Although  it  is  not  certain  how  Xpatch  resolves 
this  issue,  the  discrete  sp^  n  of  x[n]  can  easily  be  centered  by  circularly  shifting  the  spectrum  to 
the  tight  about  250  MHz  or  170  discrete  frequency  points.  Thus,  the  incident  signal  in  the  discrete 
frequency  domain  is  the  shifted  DPT  of  x[n]  or 

X[ifc]  =  1,  fc  =  0,1.2,--1023  (3.2) 
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Hguie  3.1.  HRR  Signature  Generation  and  RTI  System  Block  Diagram 


k 


A/  = 


fs 

1024  -  1 


1.467  MHz 


Xpatch  creates  a  fully  polarized  signature  or  range  profile  by  generating  4  signatures  for  each 
signature  calculation:  vertical  receive/vertical  incittent  (w);  vertical  receive/horizontal  incident  (vh); 
horizontal  receiveyhorizontal  incident  (hh);  and  horizontal  receive/vertical  incident  (hv).  The  four 
signatures  are  represented  in  the  frequency  domain  and  consist  of  1024  discrete  frequency  points  as 
defined  in  Equation  (3.2).  By  Equation  (2.7)  the  range  extent  of  the  signature  is  102.4  meters  and  by 
Equation  (2.4)  the  range  resolution  is  ty}proximate]y  0.1  meters.  The  ouq>ut  data  format  of  the  Xpatch 
signatures  is  described  in  Appendix  B. 
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3.2.2  Frequency  Decimation  and  Linear  to  Circular  Polarization  Transformation  As  stated 
in  Section  3.2.1,  the  range  extent  of  the  range  profiles  is  102.4  meters.  However,  the  targets  of  interest 
are  fighter  class  aircraft  which  are  less  than  25  meters  long.  Therefore,  the  output  from  Xpatch  is 
decimated,  keeping  one  of  four  fiequency  samples,  to  increase  A/  by  a  factor  of  4,  thus  reducing 
the  range  extent  to  25.6  meters.  Also,  without  any  loss  of  information  the  four  decimated  linearly 
polarized  Xpatch  range  profiles  are  transformed  into  two  left  circularly  polarized  range  profiles  by 
the  transform  given  by  Equation  (2.1 1).  Hence,  the  fully  polarized  range  profiles  are  represented  by 
vertical  receive/left  circular  incident,  (Xt,;[k]),  and  horizontal  receive/left  circular  incident,  (X/,/[k]), 
range  profiles  where  0  <  k  <  255. 


3.2.3  Linear  FM  Pulse  Compression  At  this  point  the  range  profile  is  essentially  the  impulse 
response  of  target  over  the  finite  bandwidth  B  and  is  firee  of  any  noise.  A  realistic  response  of  the  target 
is  achieved  by  filtering  the  range  profile  by  the  ‘chirped’  HRR  radar  transfer  function  described  in 
Section  2.2.2,  adding  white  Gaussian  noise,  and  then  filtering  with  a  transfer  function  that  is  matched  to 
the  original  ‘chirped’  HRR  radar  transfer  function.  This  transfer  function  is  given  by  Equation  (2.10). 
To  correspond  to  the  Xpatch  output,  the  complex  form  of  this  equation  in  the  discrete  time  domain  is 
given  here. 


m 

n 


0,1,2, ...255 


(3.3) 


fc 


Bfm 


10.0  GHz:  carrier  frequency 
1.5  GHz:  sanqtling  frequency 
750  MHz:  bandwidth 


T  =  170.67  ns  or  25.6  m:  pulse  width 
BfmT  =  128:  pulse  compression  ratio 

Again  the  spectrum  of  /[n],  F[k],  will  not  be  centered  in  the  periodic  spectral  window,  so  F[k]  is 
circularly  shifted  to  center  the  spectrum  as  illustrated  in  Figure  3.2.  In  the  discrete  frequency  domain, 
the  circularly  polarized  Xpatch  signatures,  Xvifk]  and  X’h([k],  are  multiplied  by  F[k].  After  Xvilk] 
and  are  corrupted  by  AWGN,  as  describe  below,  they  are  both  match  filtered  by  F’[k]. 
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Figure  3.2.  Magnitude  Response  of  F[^] 

3.2.4  Noise  Corruption  As  stated  in  Chapter  I,  it  is  assumed  that  the  target  is  in  free  space 
away  from  any  clutter  or  other  noise  sources.  Therefore,  the  primary  sources  of  noise  are  assumed  to 
be  thermal  noise  received  by  the  radar  antenna  and  thermal  noise  in  the  receiver  which  is  adequately 
modeled  as  AWGN  (9).  and  are  transformed  to  the  discrete  time  domain  with  an 

IFFT  routine,  corrupted  by  complex  (complex  because  the  signals  are  coiiqrlex)  AWGN,  and  then 
transformed  back  to  the  discrete  frequency  domain  using  a  FFT  routine.  and  Xh{[A;]  are 

corrupted  by  independent  noise  realizations  because  it  is  assumed  that  they  are  processed  by  a  two 
channel  receiver.  To  set  the  notation,  the  outputs  of  the  matched  filters,  which  now  emulate  noisy 
circularly  polarized  Linear  FM  HRR  radar  signatures,  are  5vi[^]  and  5h{[A;]. 

3.3  Feature  Extraction  via  the  Prony  Model 

The  scattering  centers  of  the  target  are  modeled  using  the  Prony  algorithm  discussed  in  Section 
2.3.  Referring  to  Figure  3.2,  the  frequency  sanqiles  outside  bandwidth  of  fffc]  are  essentially  zero  and 
die  data  set  is  further  reduced  to  the  frequency  samples  within  the  bandwidth  of  F[k].  The  definidon 
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for  the  Prony  Model  of  fully  polarized  range  profiles  is  restated  here  as 


where 


SH,[k]  \ 

S„,[k]  J 


E 


k  =  0,1,2, 


N-1 


(3.4) 


Pt„  =  pole  of  the  n"*  scattering  center 

aHt„  =  an^litude  coefficient  of  the  horizontally  received  n*'*  scattering  center 
Out,  =  amplitude  coefficient  of  the  vertically  received  scattering  center 

T  =  number  of  scattering  centers 
N  =  number  of  discrete  frequencies. 


Recall  that  the  frequency  responses  of  the  scattering  centers  are  represented  by  the  Prony  model, 
therefore  N,  the  number  of  discrete  frequencies,  is  limited  to  the  116  samples  (»  680  MHz)  in  the 
center  of  F[A;]  where  the  frequency  response  is  relatively  flat.  Each  range  profile  is  modeled  using  the  T 
scattering  centers  with  the  most  energy.  Each  scattering  center  along  the  range  profile  is  characterized 
by  5  features  as  described  in  section  2.3.1.  In  matrix  form  each  range  profile  is  written  as 


'  Ip.,1 

AR,, 

TU 

Et, 

rt,  ^ 

range  profile  = 

AR,, 

Ei, 

rt. 

i,  blrl 

ARtT 

TtT 

Etr 

(3.5) 


The  parameter  estimation  procedure  does  not  inl^rently  order  the  scattering  centers  according  range, 
so  after  the  parameter  estimation  procedure  the  scattering  centers  are  sorted  a  placed  in  ascending  order 
according  to  range. 


J.4  Vector  Quantization 

Depending  on  the  classification  algorithm,  (these  algorithms  are  described  Section  3.5)  the  vectw 
quantization  process  assigns  a  code  book  value  which  corresponds  to  a  predetermined  clustering  center 
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within  the  5  dimensional  feature  space  created  by  the  S  parameters  that  define  each  scattering  center 
or  the  5  X  T  dimensional  feature  space  created  by  the  entire  set  of  T  scattering  centers  of  the  range 
profile.  The  location  of  the  clustering  centers  are  found  using  the  K-means  clustering  option  in  LNKnet 
which  is  a  multi-purpose  clustering  and  classification  software  package  developed  by  MIT  Lincoln 
Laboratories. 

Specifically,  the  clustering  centers  are  found  by  normalizing  each  column  of  Equation  (3.S),  with 
respect  to  the  entire  training  data  set,  to  zero  mean  and  unit  variance.  The  training  data  set  contains 
range  profiles  from  both  targets  at  all  aspect  angles  of  interest.  Once  the  clustering  centers  have  been 
defined,  each  normalized  scattering  center  is  assigned  the  codebook  value  which  corresponds  to  the 
closest  clustering  center  with  respect  to  Euclidian  distance.  Range  profiles  from  the  evaluation  data  set 
are  normalized  by  the  same  mean  and  variance  of  the  training  data  set  and  are  vector  quantized  to  the 
same  clustering  centers  derived  from  the  training  data  set 

If  each  scattering  center  of  the  k*'*  range  profile  is  vector  quantized,  this  range  profile  is  repre¬ 
sented  by  a  sequence  of  T  symbols  (integers  in  this  case)  from  the  symbol  set  V  =  {wi,  V2,  .  .  .  ,  Vm,  .  .  . 
where  M  is  the  number  of  symbols.  The  k*'*  range  profile  is  written  as 

O*  =  0*  o*  ...  o*.  ...  o?, 

where  T  is  the  number  of  scattering  centers  and  0(,  =  v^.  Similarly,  if  the  entire  range  profile  is 
vector  quantized  to  one  code  book  symbol,  a  sequence  of  range  profiles  is  represented  in  the  same 
way  except  that  T  corresponds  to  the  number  of  range  profiles  in  the  sequence  and  t„  is  the  n*'*  range 
profile. 

3.5  Classification  using  HMM’s 

Referring  to  Figure  3.1  the  HMM  classification  block  has  2  inputs.  One  of  these  inputs  is  the 
track  history  from  the  tracking  system  and  assumed  to  be  accurate  to  within  ±5°  (16).  The  other  input 
is  the  vector  quantized  range  profiles. 

In  general,  the  classification  process  is  composed  of  two  distinct  procedures:  the  training 
procedure  and  the  classification  procedure.  During  the  training  procedure  a  HMM  (or  several  HMM's 
depending  the  classifier  implementation)  for  each  target  is  synthesized  from  the  training  data  set 
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using  the  Baum- Welch  leestimation  procedure  (21)  (22).  Classification  of  unknown  range  profiles 
(or  sequences  of  range  profiles)  is  accomplished  by  calculating  the  probability  of  the  unknown  range 
profile  given  the  HMM  (or  HMM’s)  from  each  class.  The  class  with  the  highest  probability  is  assigned 
to  the  unknown  range  profile  (or  sequences  of  range  profiles). 

3.5.1  HMM  Implementation  The  temporal  relationship  between  the  scattering  centers  as  well 
as  the  range  profiles  suggest  that  a  left-right  HMM  is  well  suited  for  this  problem.  As  stated  in  Section 
2.4.12,  a  left-right  HMM  does  not  allow  state  transitions  to  a  lesser  state.  Also,  the  initial  state  is 
always  state  1.  In  Section  2.4.12.3,  the  Baum- Welch  training  procedure  with  multiple  sequences  and 
scaled  forward-backward  coefficients  is  discussed  and  the  justification  for  why  Equations  (2.107)  and 
(2.108)  will  not  work  is  given.  In  this  section,  the  training  algorithm  developed  for  this  research  effort 
will  be  described. 


The  modified  leestimation  equations  used  for  this  research  effort  are 


EtLi  ELi  1 

•  It 
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.(i)A‘.(*)' 

(3.6) 


for  1  <  i  <  iV  and  I  <  j  <  N  and 


6,(m) 


ejl,  el.  4-s;.(i)^.(i) 


ELrELr  ^S*.(r)A‘(*) 


(3.7) 


for  1  <  i  <  AT  and  1  <  m  <  M.  To  show  why  Equations  (3.6)  and  (3.7)  are  valid,  first  Equation 
(3.7)  is  expressed  in  terms  of  the  unsealed  a(,(t)  and  Pt,{i)  coefficients  as 


6,(m) 


■yyiio*  = 


Ef.iCf  SL.  ?-of.(0A*.(i) 
EiLrEL,  7^ 

c^ifO*  =  «„ 


Ef=,EL.7f,(i) 


(3.8) 
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for  1  <  t  <  iV  and  1  <  m  <  M.  Thus,  (3.7)  is  equivalent  to  (2.106).  Similarly,  Equation  (3.6)  is 


W.(0] 

•  t» 


(3.9) 


for  1  <  i  <  N  and  1  <  j  <  N.  The  additional  1/cf^  factor  in  Equation  (3.9)  does  not  appear  in 
(2.105),  but  was  emperically  found  to  work  well  for  training  left-right  HMM’s.  A  purely  heuristic 
argument  to  explain  why  (3.9)  works  is  that  the  1/cf^  factor  slightly  enhances  the  more  probable 
observation  sequences  because  the  factors  are  generally  smaller  for  the  more  probable  observation 
sequences.  As  a  result,  the  state  transition  probabilities  are  driven  by  the  most  likely  observation 
sequences.  Because,  (3.9)  is  not  equivalent  to  (2.10S)  the  training  procedure  does  not  insure  that 
P(0  I A  )  >  P(0  I A  ),  but  it  causes  just  enough  equivocation  to  prevent  P(0  |A  )  from  settling  at  a 
less  than  optimum  critical  point.  The  training  procedure  developed  for  this  thesis  and  the  experimental 
results  that  validate  this  training  procedure  are  found  in  (Chapter  IV. 


3.5.2  Classification  Based  on  a  Single  Range  Profile  The  HMM  classifier  can  be  in^lemented 
to  classify  unknown  targets  based  on  a  single  range  profile  or  on  a  sequence  of  range  profiles.  Classi¬ 
fication  using  single  profiles  is  desirable  because  classification  is  nearly  instantaneous  and  it  generally 
requires  only  a  rough  estimate  of  the  aspect  angle.  Using  this  method  several  HMM’s,  one  for  each 
aspect  angle  sector,  can  be  used  to  represent  the  target.  For  exanqrle,  a  HMM  could  be  synthesized 
using  range  profiles  of  the  target  over  a  10°  by  10°  sector  at  the  nose  of  the  target  When  the  tracking 
system  shows  the  target  in  that  sector  the  corresponding  HMM’s  for  each  target  for  that  sector  are  used 
to  classify  the  unknown  target  A  block  diagram  of  such  a  classification  algorithm  for  a  single  sector 
is  shown  in  Figure  3.3. 


3.5.3  Classification  Based  on  Single  Looks  at  Multiple  Range  Profiles  To  identify  the  target 
after  a  sequence  of  range  profiles  is  received  and  processed,  the  single  look  classifier  shown  in  Figure 

3.3  must  be  modified  slightly  so  that  it  waits  to  make  the  classification  decision  after  the  entire  range 
profile  sequence  has  been  processed.  So,  the  classification  decision  is  now  made  by  comparing  the  sum 
of  the  individual  range  profile  log  probabilities  for  each  single  look  HMM’s. 
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Figure  3.3.  Single  Range  Profile  HMM  Classifier 


3.5.4  Classification  Based  on  Sequences  of  Vector  Quantized  Range  Profiles  Although  clas¬ 
sification  based  on  a  single  range  profile  may  be  desirable,  only  part  of  the  available  information  about 
the  target  is  used.  Assuming  that  the  taiget  makes  a  smooth  transition  through  the  aspect  angle  space, 
how  the  range  profiles  change  from  aspect  angle  to  aspect  angle  also  provide  information  that  could  be 
used  for  target  classification  (17). 

One  way  to  model  the  range  profile  changes  is  to  reduce  the  entire  range  profile  to  one  symbol  via 
a  vector  quantization  process.  One  HMM  per  aspect  angle  sector  is  synthesized  for  each  taiget  using 
sequences  of  range  profiles  (one  integer  per  range  profile).  The  classifier  implementation  for  multiple 
range  profiles  is  very  similar  to  the  single  range  profile  classifier  shown  in  Figure  3.3.  The  primary 
difference  between  the  two  is  that  the  HMM  is  designed  to  model  the  change  between  sequences  of 
range  profiles  (each  range  profile  is  represented  by  one  symbol)  instead  of  a  sequence  of  scattering 
centers.  Therefore,  the  classifier  requires  a  number  of  range  profiles  to  make  the  classification  decision. 
Although  the  temporal  relationships  between  the  range  profiles  if  used  with  this  type  of  classifier,  the 


3-9 


temporal  relationships  between  the  individual  scattering  centers  a’ong  the  range  profiles  are  essentially 
lost  by  condensing  all  of  the  information  of  each  range  profile  to  a  single  number. 

3.5.5  Classification  Based  on  the  State  Transitions  of  Multiple  Range  Profiles  Ideally,  the 
classifier  should  model  both  the  temporal  relationships  between  the  scattering  centers  as  well  as  the 
temporal  relationships  between  the  range  profiles.  Figure  3.4  shows  an  implementation  of  such  a 
classifier.  The  HMM’s  for  each  class  on  the  left  of  the  figure  are  Single  Look  HMM’s  in  Figure  3.3. 
The  outputs  of  the  class  1  and  class  2  single  range  profile  HMM’s  are  best  state  sequences  of  the 
individual  range  profiles  as  calculated  by  the  Vertibi  algorithm  described  in  Sections  2.4.8  and  2.4. 12.2. 
The  remaining  HMM’s  correspond  to  states  of  the  single  range  profile  HMM’s.  The  state  1  HMM  for 
class  1  is  trained  to  model  the  scattering  centers  which  belong  to  state  1  of  the  class  1  single  profile 
HMM  for  the  entire  sequence  of  range  profiles.  The  following  example  illustrates  the  process. 

1 .  Let  O  be  a  sequence  of  5  range  profiles  from  an  unknown  target  which  is  modeled  as  having  5 
scattering  centers.  Hence, 
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2.  The  most  probable  state  sequences  through  the  Single  Look  HMM’s  are  determined  for  each  of 
the  above  range  profiles  via  the  Viteii>i  algorithm.  For  this  example,  let  the  Single  Look  HMM’s 
be  3  state  left-right  HMM’s.  Now  suppose  that  the  most  probable  state  sequence  through  one 
the  Single  Look  HMM’s  for  all  of  the  above  range  profiles  is: 

Q*‘  =  1  1  1  2  3 

=  1  2  2  3  3 
<?•’  =  1  1  1  3  3 

Q'*  =  11223 
Q*®  =  1  2  3  3  3 
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3.  The  state  sequences  are  ananged  left  to  right  and  top  to  bottom  as  follows: 


O^'  =  Ol 
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From  the  above  example,  it  is  clear  that  the  number  of  scattering  centers  that  correspond  to  a 
given  state  will  not  necessarily  be  the  same  for  all  range  profile  sequences.  More  importantly,  the 
Single  Look  HMM’s  for  each  class  will  generally  segment  the  range  profiles  to  different  lengths  for 
each  state.  Because  identification  is  based  the  probability  of  the  observation  sequence  given  the  model, 
shorter  sequences  have  a  clear  advantage.  Therefore,  each  state  must  be  assigned  a  standard  length  for 
reliable  classification.  Thus,  all  state  observation  sequences  that  are  longer  than  the  standard  length 
for  that  state  are  truncated  to  the  preassigned  length  and  state  observation  sequences  which  are  shorter 
than  the  pieassigned  length  are  padded  at  the  end  by  a  null  symbol. 

The  final  classification  decision  is  based  on  the  con^arison  of  the  sums  of  the  log  probabilities 
of  the  5  state  HMM’s  for  each  class  and  it  occurs  after  a  predetermined  number  of  range  profiles  have 
been  received  and  processed. 

3.5.6  Classification  Based  on  Uniform  State  Transitions  of  Multiple  Range  Profiles  This 
classifier  is  similar  to  the  classifier  described  above  except,  the  segmentation  of  the  range  profiles  do 
not  depend  on  the  best  state  sequence  of  the  Single  Look  HMM’s.  Instead,  the  range  profiles  are  divided 
evenly.  For  example,  if  each  range  profile  contains  10  scattering  centers,  the  first  two  scattering  centers 
will  correspond  to  the  state  1  HMM  in  Figure  3.4,  the  third  and  fourth  scattering  centers  correspond  to 
the  state  2  HMM,  and  so  on.  The  null  symbol  is  not  required  for  this  classifier  because  each  sequence 
will  have  a  deterministic  number  of  observations.  Again  the  classification  decision  is  based  on  the 
comparison  of  the  sums  of  the  log  probabilities  of  the  S  state  HMM’s  for  each  class  and  it  occurs  after 
a  predetermined  number  of  range  profiles  have  been  received  and  processed. 

3.6  Summary 

In  this  chapter  the  RTI  algorithm  has  been  described  in  detail.  The  front-end  signal  processing 
and  feature  extraction  processing  is  common  to  all  of  the  classification  algorithms  to  be  implemented 
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Figure  3.4.  Multiple  State  Sequential  Range  Profile  HMM  Classifier 


for  this  thesis.  The  vector  quantization  procedure  reduces  scattering  centers  to  a  single  code  book  entry 
or  the  entire  range  profile  to  a  single  code  book  entry  depending  on  the  type  of  classifier  implemented. 
In  all,  S  classifiers  will  be  implemented  for  this  thesis.  The  experimental  results  pertaining  to  all  phases 
of  the  RTI  process  are  given  in  Chapter  IV. 
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IV.  EXPERIMENTAL  RESULTS 


4.1  Introduction 

The  theoretical  basis  for  the  RTI  algorithms  developed  for  this  thesis  is  given  in  Chapter  II,  and 
Chapter  m  contains  a  general  description  these  algorithms.  The  purpose  of  this  ch^ter  is  to  document 
the  experimental  results  pertaining  to  all  phases  of  the  RTI  process.  Accordingly,  this  chapter  is 
arranged  as  follows: 

•  Front-end  signal  processing  and  feature  extraction. 

•  Vector  quantization. 

•  HMM  training  verification. 

•  Classification  based  on  a  single  range  profile. 

•  Classification  based  on  multiple  vector  quantized  range  profiles. 

•  Classification  based  on  multiple  state  separated  range  profiles. 

•  Classification  based  on  all  of  the  above  three  classification  schemes. 

Conclusions  and  recommendations  based  on  these  results  are  discussed  in  Chapter  V. 

4.2  Ftx>nt-End  Signal  Processing  and  Feature  Extraction 

The  purpose  of  this  section  is  to  report  the  experimental  results  which  verify  the  front-end  signal 
processing  and  feature  extraction  processing  which  are  common  to  ail  of  the  classification  algorithms 
implemented  for  this  thesis.  I  he  fiont-end  signal  processing  consists  of  transforming  from  linear 
to  circular  polarization,  filtering  by  a  linear  FM  pulse  compression  transfer  function,  corrupting  by 
AWGN,  and  then  filtering  by  a  filter  matched  to  the  linear  FM  pulse  compression  filter.  The  Prony 
model  parameter  estimation  is  the  heart  of  the  feature  extraction  processing. 

4.2.1  Processing  of  a  Known  Idealized  Range  Profile  The  first  part  of  this  experiment  is 
conducted  with  a  known  idealized  range  profile  of  six  point  scatterers.  The  point  scatterers  are  sampled 


at  8  times  the  bandpass  Nyguist  rate  to  improve  the  granularity  of  the  ensuing  plots.  The  equation  to 
generate  the  point  scatterers  is  expressed  in  the  discrete  frequency  domain  as  follows: 


X^y[k]  = 
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The  EFFT  of  X*y[fc]  has  6  point  targets  at  3.2  m,  6.4  m,  6.6  m,  12.8  m,  13.2  m,  and  24  m.  A  plot  for 
a  single  polarization  of  the  IFFT  of  Xxy[k]  is  shown  in  Figure  4.1.  In  this  plot,  the  range  extent  is 
25.6  meters  and  the  number  of  sample  points  is  2048,  which  corresponds  to  8  samples  per  0. 1  meters 
in  the  discrete  range  domain.  The  RTI  system  block  diagram  shown  in  Figure  3.1  gives  the  general 
signal  flow.  Thus,  the  4  linear  polarized  signals  are  transformed  to  2  left  circularly  polarized  signals 
by  equation  (2. 1 1)  to  yield 


Xhi[k]  =  ^(X;.h[fcl+jA-h„[fc]) 

Xw[k]  =  ;^(X„„(fc)+jXh„[/:]). 

At  this  point,  and  are  multiplied  by  an  over  sampled  linear  FM  transfer  function,  FfA;]. 
F[A:]  is  the  FFT  of  equation  (3.3)  where  T,  —  %■  B  and  n  =  0, 1, 2, ... ,  2047.  Without  adding  any 
noise,  the  2  signals  are  then  matched  filtered  by  F*[Ii;].  Noise  is  not  added  to  these  signals  so  that  the 
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effects  of  the  filtering  and  feature  extraction  can  be  clearly  seen.  To  stay  consistent  with  the  notadon, 
the  signals  out  of  the  matched  filter  are  .Su/fA;]  and  5m[A:]. 

Figure  4.2  shows  the  point  scatterer  at  3.2  m  for  a  single  polarization  before  and  after  filtering  by 
the  linear  FM  pulse  compression  transfer  function  and  matched  filter.  The  distance  between  the  peak 
and  the  first  null  of  the  unfiltered  signal  is  0. 1  m,  which  is  the  expected  range  resolution  for  a  bandwidth 
of  l.S  GHz.  After  filtering  by  the  750  MHz  hnear  FM  transfer  function  and  matched  filter,  the  shape 
of  the  pulse  is  about  the  same  but  the  range  resolution  is  approximately  0.2  m.  Figure  4.3  shows  that  2 
point  scatterers  separated  by  0.2  m  can  be  separated  if  the  bandwidth  is  1.5  GHz.  However,  after  the 
matched  filter  the  two  scatterers  appear  as  1  scatterer.  Referring  to  Figure  3.2,  the  actual  bandwidth  is 
slightly  less  than  750  MHz  and  hence  the  actual  range  resolution  is  just  over  0.2  m.  In  Figure  4.4  die 
point  scatterers  at  12.8  m  and  13.2  m  are  clearly  separated  before  and  after  filtering.  Although  all  of 
the  point  scatterers  are  generated  with  a  magnitude  of  1,  interaction  between  the  scatterers,  especially 
after  the  matched  filter,  causes  the  relative  amplitudes  to  vary. 

5v([A;]  and  Shi[k]  are  identical  in  magnitutte  and  phase,  therefore  the  six  point  scatterers  are 
circularly  polarized.  Before  the  Prony  model  parameter  estimation  procedure,  5v([A:]  and  5«,([k]  are 
truncated  to  the  116  frequency  samples  well  within  the  bandwidth  of  F[A;].  The  resulting  Prony 
model  parameter  estimation  output  is  shown  in  Figure  4.5.  All  of  the  point  scatterers  have  a  circular 
polarization  ellipse.  Also,  the  2  point  scatterers  at  3.2  and  3.4  meters  are  clearly  separated  even  though 
the  parameter  estimation  procedure  inputs  had  bandwidths  considerably  less  than  750  MHz. 
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Figure  4. 1 .  Six  Point  Scatteiers  (Uniform  Bandwidth  of  1 .5  GHz) 


Figure  4.2.  Point  Scatterer  at  3.2  meters,  Before  and  After  Linear  FM  Pulse  Compression 


Figure  4.3.  Point  Scatterers  at  6.4  and  6.6  meters.  Before  and  After  Linear  FM  Pulse  Compression 


Figure  4.4.  Point  Scatterers  at  12.8  and  13.2  meters.  Before  and  After  Linear  FM  Pulse  Compression 
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Figure  4.5.  Six  Circularly  Polarized  Point  Scatterers  Represented  by  the  Prony  Model 
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4.2.2  Processing  Typical  Xpatch  Range  Profiles  The  second  part  of  this  experiment  is  con¬ 
ducted  using  typical  Xpatch  range  profiles.  In  this  case,  the  iiont-end  signal  processing  and  feature 
extraction  are  performed  as  described  in  Sections  3.2  and  4.2  with  a  SNR  of  20  dB  and  /.  =  1.5  GHz. 
The  SNR  is  expressed  in  terms  of  peak  signal  power  to  average  noise  power.  The  noise  corruption 
process  is  performed  as  follows: 

1.  The  horizontal  and  vertical  range  profiles  are  transformed  to  the  time  domain  using  an  IFFT 
routine  and  are  normalized  so  that  the  largest  scatterer  has  an  amplitude  of  1 . 

2.  Independent  Gaussian  noise  is  added  to  each  range  bin.  The  SNR  refers  to  the  level  of  the  noise 
compared  to  unit  amplitude. 

3.  The  range  profiles  are  transformed  back  to  the  frequency  domain. 

Figure  4.6  shows  typical  range  profiles  after  the  matched  filter  from  each  or  the  two  classes  used 
for  this  thesis.  Although  the  change  in  aspect  angle  is  approximately  1°,  the  range  profiles  change 
dramatically.  The  scattering  center  at  12.8  m  which  is  relatively  large  in  Figure  4.6  A  but  it  almost 
disappears  in  Figure  4.6  B.  It  is  also  important  to  note  that  the  relative  distance  between  the  first,  second 
and  third  scattering  centers  of  the  range  profiles  in  figures  4.6  B  and  D  are  almost  identical.  For  this 
reason,  these  two  targets  are  difficult  to  separate  by  classification  algorithms  which  identify  targets  by 
matching  peaks  in  the  range  profile  to  a  templates.  The  GD  algorithm  described  in  Section  1.3.3  is  as 
example  of  this  type  of  algorithm.  The  SNR  out  of  the  matched  filter  is  approximately  10  dB  above  the 
SNR  ratio  prior  to  the  matched  filter.  All  of  the  SNR’s  reported  for  the  remainder  of  the  experiments 
described  in  this  chapter  are  prior  to  the  matched  filter. 

Figures  4.7  A-E  summarize  the  results  of  this  experiment.  The  IFFT’s  horizontal  and  vertical 
range  profiles  out  of  the  matched  filter  are  shown  in  Figures  4.7  A  and  B  respectively.  The  output  of 
the  feature  extraction  process  is  given  graphically  in  a  Figure  4.7  C  and  in  tabular  form  in  Figure  4.7 
D.  Figure  4.7  E  defines  the  parameters  of  the  polarization  ellipses  in  Figure  4.7  C. 

The  scatterer  at  12.3  meters  has  the  highest  energy  despite  the  fact  that  it  is  not  one  of  the 
larger  scatterers  in  Figures  4.7  A  and  B.  The  reason  for  this  is  that  the  energy  calculation,  as  given 
by  equation  (2.14),  is  greatly  effected  by  lpi,|  because  of  the  factor.  Therefore,  scatterers 

with  relatively  large  |p(,|  coefficients  will  have  more  energy.  The  scatterer  at  12.3  meters  has  the 
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Figure  4.6.  Typical  Range  Profiles  fix)in  Qass  1  and  Gass  2  After  the  Matched  Filter 

greatest  energy  because  it  has  the  laigest  |p(,|  coefficient  (|P(,|  =  1.044)  as  shown  in  Figure  4.7 
D.  This  phenomenon  is  further  illustrated  by  the  6  scatterers  between  13.7  and  16.8  meters.  All  of 
these  scatterers  have  relatively  high  amplitudes  in  the  IFFT  range  profiles,  but  they  have  small  |pt,  I 
coefficients  and  extremely  low  energy  -  low  enough  so  that  their  polarization  ellipses  are  too  small  to 
be  seen  in  Figure  4.7  C. 

Its  important  to  note  that  the  amplitudes  of  the  scattering  centers  of  the  horizontal  IFFT  range 
profile  in  Figure  4.7  A  are  generally  larger  than  the  amplitudes  of  the  scattering  centers  of  the  vertical 
IFFT  range  profile  in  Figure  4.7  B.  Accordingly,  the  orientations  of  the  major  axes  (QA)  of  the 
polarization  ellipses  in  Figure  4.7  C  are  generally  tilted  toward  the  horizontal  h  axis. 
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A.  hi  Range  Profile  (IFFT  of  Sfj[kJ) 
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Figure  4.7.  Front-End  Processing  and  Feature  Extraction  for  an  Actual  Range  Profile 
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4.2.3  Range  Profile  Computing  Time  Both  the  front-end  processing  and  feature  extraction 
processing  algorithms  are  implemented  in  M ATLAB  and  each  Xpatch  signature  takes  10  to  IS  seconds 
to  process  on  a  Sun  Sparcstation  2  workstation.  MATLAB  is  inherently  slow  because  it  interprets  the 
program  a  line  at  a  time  instead  of  compiling  the  program  into  machine  language.  The  majority  of 
the  coiTq)uting  dme,  however,  is  spent  doing  the  SVD,  which  MATLAB  has  implemented  in  machine 
language.  As  will  be  shown  in  the  following  section,  the  amount  of  processing  time  per  range  profile 
has  far  reaching  implications. 

4.3  Vector  Quantization 

The  next  step  in  the  RTI  process  is  vector  quantization.  As  discussed  in  (22)  and  shown  in  the 
next  section,  there  is  a  definite  trade-off  between  the  number  of  clustering  centers  or  number  of  code 
book  symbols  and  quantization  error.  That  is,  the  B  matrix  of  the  HMM  must  have  as  many  columns 
as  code  book  symbols.  Increasing  the  size  of  the  B  matrix  increases  the  number  of  parameters  that 
need  to  be  estimated  during  the  HMM  training  procedure  which  in  turn  increases  the  size  required  for 
the  training  set  Thus,  the  purpose  of  this  experiment  is  to  determine  the  quantization  error  versus  M, 
number  of  code  book  symbols. 

Figure  4.8  shows  the  average  quantization  error  versus  M  (on  a  log  scale)  where  each  scattering 
center  of  the  range  profile  is  vector  quantized.  The  clustering  centers  are  found  using  the  K-means 
clustering  option  in  LNKnet  In  all,  280, OCX)  normalized  scattering  centers  (140,000  from  each  class) 
from  28,000  range  profiles  are  used  for  this  experiment.  As  Figure  4.8  clearly  shows,  the  average 
quantization  error  decreases  linearly  as  M  increases  between  16  and  64.  For  M  greater  than  64, 
however,  the  average  quantization  error  decreases  less  between  128  and  1024. 

The  same  experiment  is  conducted  for  the  range  profile  vector  quantization  as  shown  in  Figure 
4.9.  In  this  experiment  a  50  dimensional  feature  vector  is  vector  quantized.  The  relationship  between 
the  average  quantization  error  and  M  is  linear  for  M  less  than  or  equal  to  1024.  The  information 
learned  from  this  experiment  and  the  HMM  training  verification  discussed  in  the  next  section  is  used 
to  determine  the  size  of  HMM’s. 
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4.4  HMM  Synthesis 

With  respect  to  the  HMM  classification  algorithms  developed  for  this  thesis,  the  most  critical 
process  is  the  solution  to  Problem  3  (ref  Section  2.4.4)  -  the  estimate  of  the  parameters  of  the  HMM’s 
given  the  observations.  Of  the  3  HMM  design  problems  stated  in  Section  2.4.4,  Problem  3  is  also  the 
most  difficult  because,  unlike  the  other  2  problems,  it  does  not  have  an  exact  analytical  solution.  With 
this  in  mind,  the  purpose  of  the  experiments  in  this  section  are  two  fold' 

1 .  To  gain  insight  as  to  how  much  training  data  is  required  to  adequately  train  the  HMM’s. 

2.  To  insure  that  the  training  procedure  implemented  provides  reasonable  estimates  of  the  HMM 
parameters. 

4.4.1  Training  Data  Requirements  The  amount  of  training  data  required  depends  on  the  num¬ 
ber  of  model  parameters  (22).  To  make  a  rough  estimate  of  the  amount  of  training  data  required,  this 
experiment  is  conducted  as  follows.  First,  two  hypothetical  left-right  HMM’s  are  created.  Let  the  first 
model  be  labeled  1  and  let  the  second  model  be  la.  These  model  parameters  are 


1.  Model  -1;  (iV  =  5  and  A/  =  32) 
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Figure  4.1 1.  Model  -la  B  la  Matrix  Probabilities 


Next,  Models  1  and  la  are  used  to  generate  observation  sequences  using  the  procedure  given  in  Section 
2.4.3.  Each  observation  sequence  has  of  10  observations.  Figure  4.11  shows  a  histogram  of  the 
generated  observations  that  correspond  to  each  state  normalized  by  the  total  observations  that  occur  in 
that  state.  Ideally,  the  normalized  histogram  of  the  generated  observations  should  be  about  the  same 
as  the  representation  of  B  i  matrix  shown  in  Figure  4.10.  For  Model  1,  it  appears  that  at  stochastic 
properties  of  the  model  are  not  well  represented  for  observation  sequences  of  1000  observations  or 
less.  That  is  to  say,  if  less  than  1000  observation  sequences  are  used  to  synthesize  a  nuxlel  using 
the  Baum-Welch  training  procedure  the  resulting  model  would  show  little  resemblance  to  the  model 
which  generated  the  observation  sequences.  As  can  be  seen  by  Figure  4.13,  the  number  observation 
sequences  required  to  adequately  represent  the  model  also  increases.  From  this  experiment,  one  could 
conclude  that  1000  to  5000  observation  sequences  would  be  required  to  train  Model  1  and  10000  to 
20000  observation  sequences  would  be  required  to  train  Model  la.  Although  not  shown  here,  the 
histograms  of  the  >4  matrices  for  both  models  displayed  similar  properties. 
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Figure  4.12.  Model  -1  Normalized  Observation  Histograms 
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Probability 


A.  500  Observation  Sequences  B.  5000  Observation  Sequences 


C.  10000  Observation  Sequences  D.  20000  Observation  Sequences 


Figure  4.13.  Model  -la  Normalized  Observation  Histograms 
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4.4.2  HMM  Training  The  objective  of  the  second  part  of  this  experiment  is  to  insure  that  the 
training  procedure,  which  will  be  described  below,  yields  reasonable  estimates  of  the  model  parameters. 
To  do  this,  2  additional  left-right  models  are  created,  where  N  =  5  and  M  =  32.  Let  these  models  be 
labeled  Model  -2  and  Model  -3.  Model  - 1  is  the  same  as  above.  The  B  matrices  of  the  3  nKxlels  are 
shown  in  Figures  4. 14  -  4. 16  A.  The  A  matrix  for  models  2  and  3  are 


0.3 

0.4 

0.2 

0.1 

0.0 

0.0 

0.4 

0.3 

0.2 

0.1 

0.0 

0.0 

0.6 

0.3 

0.1 

0.0 

0.0 

0.0 

0.4 

0.6 

0.0 

0.0 

0.0 

0.0 

1.0 

(4.7) 


0.04 

0.09 

0.26 

0.28 

0.32 

0.0 

0.25 

0.27 

0.24 

0.23 

0.0 

0.0 

0.06 

0.68 

0.26 

0.0 

0.0 

0.0 

0.13 

0.87 

0.0 

0.0 

0.0 

0.0 

1.0 

(4.8) 


The  3  models  are  used  to  generate  KXXX)  sequences  (each  with  10  observations)  using  the  procedure 
given  in  Section  2.4.3.  These  observation  sequences  are  then  used  to  train  3  models  using  the  training 
procedure  developed  for  this  thesis.  This  procedure  is  as  follows: 


1.  Start  with  a  uniformly  distributed  left-right  model,  A.  That  is. 
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and 


(4.10) 


6,(m) 


2.  Compute  P{0  |A  ) 

3.  Estimate  the  parameters  of  A  using  equations  (3.6)  and  (3.7). 

4.  Compute  P{0  |A  ). 

5.  Replace  A  with  A  .  (Note:  All  B  matrix  coefficients  less  10~’°  ate  changed  to  10~'".  Rabiner 
has  shown  that  not  letting  the  B  coefficients  approach  0  improves  the  classification  rate  (22).) 

6.  If  P(0  |A  )  —  P{0  |A  )  <  0.01  terminate  the  procedure;  otherwise,  return  to  step  2  and  repeat 
the  procedure. 


The  above  training  procedure  is  used  to  train  3  models;  1  for  each  observation  sequence.  The 
B  matrices  after  training  are  given  in  Figures  4.14  -  4.16  B  and  the  original  B  matrices  and  the 
B  matrices  after  training  are  overlayed  in  part  C  of  Figures  4.14  -  4.16.  The  A  matrices  after  training 
are 
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and 
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Clearly,  the  training  procedure  is  able  to  synthesize  models  that  are  almost  identical  to  the  original 
models  used  to  generate  the  observation  sequences  used  for  training.  The  most  significant  result  of  this 
experiment  is  that  the  models  are  estimated  with  no  -  with  the  exception  of  knowing  that  the  models 
are  left-right  models  -  apriori  knowledge  of  the  original  models. 

In  Figures  4.14  -  4.16  D,  P{0  |A  )  is  plot  as  a  function  of  training  epochs.  Model  -  1,  which 
has  the  least  amount  of  overlap  between  states,  trains  with  about  1/4  the  number  of  epochs  as  Models 
-  2  and  -  3.  Also,  P{0  |  A  )  increases  rapidly  during  the  first  few  training  epochs  for  all  three  models, 
but  for  Models  -  2  and  -3  the  slope  of  P{0  |  A  )  levels  off  and  then  increases  sharply  before  leveling 
off  again.  A  possible  reason  for  this  is  that  the  P{0  |A )  surface  is  more  complex  for  Models  -  2  and 
-3  than  it  is  for  Model  -  1 .  Using  equation  (3.6)  allows  the  training  procedure  to  search  the  P(0  |  A  ) 
surface  for  a  more  probable  model  then  would  be  found  if  the  training  procedure  was  allowed  to  stop 
at  a  the  first  local  maximum. 

The  next  logical  step  is  to  see  if  the  synthesized  models  can  recognize  observation  sequences 
that  are  generated  from  the  corresponding  original  models.  To  do  this,  each  of  original  models  is 
used  to  generate  200  observation  sequences  with  10  observations  each.  Each  observation  sequence  is 
then  scored  by  each  of  the  synthesized  models  and  the  model  with  the  highest  probability  given  the 
observation  sequence  is  declared  the  winner.  Two  measures  are  used  for  scoring; 


•  P(0  |A  )  as  calculated  with  the  scaled  forward  algorithm. 

•  P{0  IQ  *,  A  )  or  the  probability  of  the  observation  sequence  given  the  best  state  sequence  and 
the  model.  P(0  |Q  A  )  was  calculated  using  the  ^terbi  algorithm. 
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The  correct  classification  using  the  P{0  |A  )  metric  is  97.3  %  and  the  correct  classification  for  the 
P{0  IQ  *,  A  )  metric  is  96.5  %.  Also,  the  confusion  matrices  for  the  both  probability  measures  are 
given  in  Tables  4. 1  and  4.2. 


Table  4.1.  Confusion  Matrix  Using  P{0  |A  ) 


Actual 
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Model -1 

Chosen  Model 
Model -2 

Model -3 

Model -1 
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Model -3 
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193 

Table  4.2.  Confusion  Matrix  Using  P(0  |Q  A  ) 
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Model -1 

Chosen  Model 
Model -2 

Model -3 

Model -1 

192 
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Model -2 
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196 
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Model -3 

9 

0 

191 

The  difference  in  the  classification  rates  for  both  decision  metrics  is  not  statistically  significant. 
However,  it  is  interesting  to  note  that  the  primary  source  of  confusion  is  between  Model  -1  and  Model- 
3  even  though  Model  -  1  and  Model  -2  have  identical  state  transition  matrices.  From  this  it  is  apparent 
that  the  B  matrix  parameters  are  more  cri  deal  than  the  A  matrix  parameters. 
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Figure  4.16.  Model  3  Training 
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4.5  Classification  Based  on  a  Single  Range  Profile 

For  this  experiment,  each  HMM  of  the  single  range  profile  classifier  in  Figure  3.3  is  a  left-right 
model  with  N  =  b  and  M  =  128  (S  by  128).  Rabiner’s  research  showed  that  classification  accuracies 
did  not  change  appreciably  for  models  with  5  or  more  states.  For  this  reason,  all  of  the  HMM  used  for 
this  thesis  will  have  S  states.  On  the  other  hand,  the  choice  of  M  requires  a  trade-off  between  average 
vector  quantization  error  and  number  of  training  sequences  required. 

Increasing  M  decreases  the  average  vector  quantization  enor,  but  the  number  of  training  vectors 
increase.  Considering  that  it  takes  10  to  IS  seconds  to  process  every  range  profile  used  for  training,  the 
number  of  range  profiles  required  for  training  is  a  real  issue.  Referring  to  Figure  4.8,  M  =  128  is  right 
at  the  beginning  of  the  ‘knee’  of  the  curve  and  for  M  >  256  the  average  quantization  error  is  nearly 
constant.  Based  solely  on  the  average  quantization  error,  M  =  256  appears  to  be  the  best  choice;  but 
computing  time  constraints  forces  Af  to  be  no  more  than  128.  Even  for  M  =  128,  the  computing  time 
to  process  the  training  data  set  is  well  over  100  houre. 

The  HMM  for  class  1  is  trained  using  14,000  vector  quantized  range  profiles  from  vehicle  1. 
Each  range  profile  consists  of  10  scattering  centers  and  have  a  SNR  of  20  dB.  In  all,  the  training  set 
consists  of  140  noise  independent  range  profiles  for  each  of  the  100  aspect  angles  within  the  aspect 
angle  sector.  The  class  2  HMM  is  trained  with  the  same  type  of  data  from  vehicle  2.  Both  models 
are  trained  using  the  training  procedure  outlined  in  Section  4.4.2.  The  evaluation  data  set  contains  600 
profiles  -  6  for  each  aspect  angle  -  from  each  target  at  SNR’s  of  20  dB,  1 S  dB,  1 0  dB,  and  5  dB,  0  dB, 
and  -  5  dB. 

As  a  comparative  test,  the  same  data  set  is  processed  by  the  K-nearest  neighbor  (KNN)  classifier 
option  in  LNKnet.  To  do  this,  the  range  profiles  from  both  the  training  and  the  evaluation  data  sets  are 
concatenated  so  that  each  range  profile  is  represented  by  a  50  dimensional  feature  vector.  The  LNKnet 
KNN  classification  algorithm  is  run  with  K  equal  to  1, 3,  5,  and  7. 

Figure  4.17  shows  the  classification  results  of  both  classifiers,  for  M  =  128.  As  an  indication 
of  the  accuracy  of  the  error  rate  estimates,  the  classification  confidence  interval  for  a  classification  rate 
of  77  %  is  ±1.7%  with  a  confidence  level  of  95  %  is  shown  in  the  bottom  right  corner  of  Figure  4.17. 
Although  the  performance  of  the  KNN  classifier  is  better  than  the  Single  Look  HMM  at  20  dB  SNR, 
the  HMM  classifier  displays  better  performance  for  SNR’s  less  than  20  dB.  The  performance  of  the 
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KNN  classifier  is  not  greatly  effected  by  the  value  of  K  which  is  an  indication  that  decision  regions 
within  the  SO  dimensional  feature  space  are  relatively  dense. 

To  test  the  effects  of  range  alignment  on  the  classification  performance,  the  target  is  allowed  to 
slide  within  the  range  window  according  to  a  Gaussian  distribution  with  0  mean  and  a  variance  of  2  m. 
The  tracking  system  is  assumed  to  be  accurate  to  within  this  2  m  window  (16).  As  Figure  4.18  shows, 
the  classification  accuracy  drops  about  10  %. 
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Figure  4.19.  Aspect  Angle  Tracks  1  through  3 


4.6  Classification  Based  on  Sequences  of  Range  Profiles 

Three  of  the  classification  algorithms  developed  for  this  thesis  are  specifically  designed  recognize 
the  target  based  on  the  inter  range  profile  temporal  relationship  within  a  sequence  of  range  profile.  These 
algorithms  are  described  in  Sections  3.S.4,  3.5.5,  and  3.5.6.  As  described  in  Section  3.5.3,  the  Single 
Look  classifier  makes  its  decision  based  on  the  the  sum  of  the  log  probability  outputs  over  several 
range  profiles. 

All  of  the  range  profile  sequences  used  to  train  and  test  the  classifiers  have  a  fixed  length  of  10 
range  profiles.  In  all  7  tracks  (sequences  of  range  profiles  through  the  aspect  window)  are  used.  These 
tracks  are  shown  in  Figures  4.19  and  4.20.  Only  tracks  1  and  4  are  used  for  training,  but  all  7  tracks 
are  used  to  test  the  performance  of  the  classifiers.  Specifically,  the  training  set  for  each  class  consists 
of  3(XX)  range  profile  sequences:  15(X)  from  track  1  and  15(X)  from  track  4.  Each  sequence  contains  10 
range  profiles  with  SNR’s  of  20  dB.  The  evaluation  data  set  has  375  range  profile  sequences  from  tracks 
1  -  7  at  SNR’s  of  20  dB,  15  dB,  10  dB,  5  dB,  0  dB,  and  -5  dB.  With  the  exception  of  the  Single  Look 
classifier  which  is  trained  using  the  data  set  described  in  Section  4.5,  these  training  and  evaluation  data 
sets  are  used  in  each  of  the  following  sequential  range  profile  classifiers. 
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Figure  4.20.  Aspect  Angle  Tracks  4  through  7 


4.6.1  Performance  of  the  Single  Look  Classifier  after  Multiple  Range  Profiles  The  perfor¬ 
mance  of  this  classifier  for  tracks  1  and  4  and  tracks  2, 3, 5, 6,  and  7  is  shown  in  Figures  4.21  and  4.22. 
This  classifier  is  track  independent  because  it  is  trained  on  independent  range  profiles,  but  at  0  dB  SNR 
there  is  a  significant  difference  in  classification  rates  between  tracks.  Thus,  there  are  tracks  in  which 
these  two  targets  are  harder  to  separate.  The  two  targets  appear  to  be  closest  within  the  feature  space 
for  track  6,  which  has  the  lowest  classification  rate  as  highlighted  in  Figure  4.22. 
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4.6.2  Performance  of  the  Sequential  Vector  Quantized  Range  Profile  HMM  Classifier  As  with 
the  Single  Look  Classifier,  M  is  chosen  based  on  the  average  vector  quantization  error  and  the  number 
of  sequences  required  for  training.  Referring  to  Figure  4.9,  the  average  quantization  error  decreases 
linearly  as  M  increases.  Although  it  is  apparent  that  M  should  be  larger  than  1024,  the  number  of 
training  vectors  required  to  train  such  a  model  would  be  prohibitive.  For  this  classifier,  M  is  chosen 
to  be  128,  based  solely  on  the  number  of  training  vectors  required.  The  HMM’s  are  trained  using  the 
procedure  outlined  in  Section  4.4.2. 

Figure  4.23  shows  the  classification  performance  for  tracks  1  and  4  at  SNR’s  of  -5  -  20  dB. 
The  classification  performance  for  the  tracks  not  specifically  trained  on  is  shown  in  Figure  4.24.  The 
performance  is  good  for  tracks  1  and  4  at  SNR’s  of  20  dB  and  IS  dB,  but  for  SNR’s  less  than  IS  dB 
the  performance  drops  rapidly.  With  the  exception  of  track  6,  the  classification  for  the  tracks  not  in  the 
training  set  is  almost  good  as  the  performance  for  the  tracks  in  the  training  set.  Increasing  the  code 
book  size  (M )  will  probably  improve  the  performance  of  this  classifier,  but  the  amount  of  training  data 
required  to  train  the  HMM’s  may  make  the  implementation  of  this  type  of  classifier  impractical. 
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Figure  4.23.  Classification  Based  on  Sequences  Vector  Quantized  Range  Profiles  (Tracks  1  and  4 ) 


Hgure  4.24.  Classification  Based  on  Sequences  Vector  (^antized  Range  Profiles  (Tracks  2 , 3,  S,  6, 
and?) 
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4.6.3  Performance  of  the  Multiple  State  Sequential  Range  Profile  Classifier  Each  of  the  5 
HMM’s  which  correspond  to  the  states  of  the  Single  Look  HMM  is  itself  a  S  state  left-right  HMM, 
but  M  is  equal  to  129  instead  of  128.  The  extra  symbol  accounts  for  the  null  symbol  (required  to 
keep  all  state  sequences  the  same  length)  which  is  discussed  in  Section  3.S.S.  For  this  experiment  the 
observation  sequence  length  for  states  1 , 2  and  3  is  limited  (or  extended  if  necessary  by  the  null  symbol) 
to  10  observations.  The  observation  length  for  states  4  and  5  is  set  at  20.  These  observation  lengths 
are  determined  empirically.  That  is,  state  1  through  3  normally  have  observation  sequence  lengths  of 
about  10  while  the  observation  lengths  for  states  4  and  5  are  normally  about  20. 

Each  of  the  range  profiles  within  these  sequences  is  segmented  into  states  using  the  Viterbi 
algorithm  which  is  implemented  using  logarithms  to  avoid  underflow.  Again,  the  HMM’s  are  trained 
using  the  procedure  outlined  in  Section  4.4.2.  Although  this  classifier  is  designed  to  use  all  5  states, 
the  best  performance  is  realized  when  only  states  2,  3,  and  4  are  used.  Thus,  the  classification  results 
reported  here  are  based  on  the  comparison  of  the  log  probabilities  summed  over  states  2,  3, 4  instead 
of  over  all  states. 

Figure  4.25  shows  the  classification  performance  for  track  I  and  4  at  SNR’s  of  -S  -  20  dB.  The 
classification  performance  for  the  tracks  not  specifically  trained  on  is  shown  in  Figure  4.26.  Consistent 
with  previous  results,  the  classification  rate  for  track  6  is  significantly  worse  than  the  other  tracks. 
The  performance  of  this  classifier  will  probably  improve  if  the  training  set  is  increased.  However,  the 
training  set  is  limited  by  computing  time.  For  3,000  range  profile  sequences  of  10  range  profiles  each, 
30,(XX)  range  profiles  must  be  processed.  Essentially,  this  classifier,  as  well  as  the  other  sequential  range 
profile  classifiers,  requires  10  times  the  number  of  training  range  profiles  as  the  single  look  classifier. 

4.6.4  Performance  of  the  Uniform  Multiple  State  Sequential  Range  Profile  Classifier  The 
HMM’s  for  this  classifier  are  all  left-right  models  with  N  =  5  and  M  =  128.  As  described  in  Section 
3.5.6,  the  range  profiles  are  evenly  segmented.  Because  each  range  profile  sequence  has  10  range 
profiles,  each  state  contains  20  observations  per  range  profile  sequence.  As  shown  in  Figures  4.27 
and  4.28,  the  performance  of  this  classifier  is  almost  identical  to  the  performance  of  the  Multiple  State 
classifier.  Again,  track  6  has  a  classification  rate  that  is  significantly  worse  than  the  other  tracks. 
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Figure  4.28.  Uniform  Multip’e  State  Sequential  Range  Profile  Classification  Rate  (Tracks  2,  3,  S,  6, 
and?) 
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4.6.5  Effect  of  Range  Profile  Alignment  on  Overall  Performance  Figure  4.6.5  shows  how  each 
of  the  classifiers  compare  for  tracks  1  and  4  when  the  range  profiles  are  centered  within  the  window. 
Figure  4.6.5  shows  the  performance  of  each  of  the  classifiers  when  the  range  profile  is  allowed  to  vary 
in  range  within  the  range  window  with  of  Gaussian  distribution  (mean  =  0  and  variance  =  2  m).  The 
performance  of  all  of  the  classifiers  is  reduced,  but  the  overall  performance  is  still  very  good.  The 
classification  performance  for  tracks  2,  3, 5, 6,  and  7  shows  similar  sensitivity  to  variations  in  range. 

4.7  Summary 

In  this  chapter  the  experimental  results  which  characterize  the  entire  RTI  process  are  presented. 
The  front-end  signal  processing  and  feature  extraction  processing  is  verified  using  a  known  idealized 
range  profile  and  typical  Xpatch  range  profiles.  The  HMM  training  procedure  is  critical  to  the  overall 
classification  process.  This  procedure  is  verified  by  synthesizing  HMM’s  using  the  training  procedure 
from  sequences  that  are  generated  with  known  models.  The  synthesized  HMM’s  are  almost  identical 
the  original  HMM’s  used  to  generate  the  training  sequences.  The  overall  classification  performance  is 
excellent  with  the  best  results  coining  from  the  single  look  multiple  range  profile  classifier  when  the 
range  profile  is  centered  in  the  range  window.  When  the  location  of  the  range  profile  is  allowed  to 
vary  within  the  range  window,  the  performance  of  the  Uniform  Multiple  State  Sequential  Range  Profile 
classifier  is  about  the  same  as  the  Single  Look  classifier. 


Figure  4.29.  Performance  of  the  Four  Classifiers  (Tracks  I  and  4)  when  the  Range  Profiles  are 
Centered  within  the  Range  Window 


Figure  4.30.  Performance  of  the  Four  Classifiers  (Tracks  1  and  4)  when  the  Range  Profiles  are  not 
Centered  within  the  Range  Window 
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V.  CONCLUSIONS  AND  RECOMMENDATIONS 


5.1  Introduction 

The  goal  of  this  research  effort  is  to  develop  an  RTI  algorithm  designed  to  identify  aircraft 
using  a  Linear  FM  pulse  conr^ression  HRR  radar  as  a  sensor.  As  a  means  of  accomplishing  this  goal, 
the  RTI  process  is  divided  into  three  sub-processes:  front-end  signal  processing,  feature  extraction, 
and  classification.  The  theory  relating  to  these  three  sub-processes  is  presented  in  Chapter  n  and  the 
Chapter  in  describes  the  RTI  algorithms  developed  for  this  thesis  by  drawing  upon  the  theory  presented 
in  Chapter  D.  The  experimental  procedures  and  results  are  given  in  Chapter  IV.  The  purpose  of  this 
chapter  is  to  provide  the  conclusion  and  recommendations  based,  in  part,  on  these  experimental  results. 

5.2  Conclusions 

5.2. 1  Front-end  Signal  Processing  and  Feature  Extraction  The  purpose  of  the  front-end  signal 
processing  is  to  emulate  noise  corrupted,  circularly  polarized  HRR  ‘chirped'  radar  signatures.  To 
validate  this  procedure,  an  idealized,  known  signature  as  well  as  typical  Xpatch  signatures  are  processed. 
These  results,  which  are  given  in  Section  4.2,  compare  well  with  theory.  In  short,  the  front-end  signal 
processing  is  able  emulate  a  physically  realizable  HRR  ‘chirp’  radar.  It  should  be  noted,  however,  that 
this  radar,  which  is  simulated  in  software,  is  itself  an  ideal  radar;  free  of  all  of  the  problems  associated 
with  implementing  systems  in  hardware. 

Based  on  the  classification  performance,  which  is  discussed  below,  the  feature  set  provided  by 
the  Prony  Model  describes  the  target  well.  However,  at  this  point  a  near-real-time  system  could  not 
be  implemented  using  this  feature  extraction  process  because  it  simply  takes  too  long  to  process  the 
range  profiles.  The  SVD  in  the  parameter  estimation  procedure  is  the  primary  source  of  delay.  The 
processing  delay  will  be  decreased  if  the  bandwidth  is  decreased  which  in  turn  will  also  decrease  the 
size  of  the  matrix  on  which  the  SVD  must  be  performed. 

5.2.2  Classification  Performance  The  classification  algorithms  developed  for  this  thesis  are 
based  on  the  temporal  relationships  between:  the  individual  scattering  centers  of  HRR  radar  range 
profiles,  the  temporal  relationships  between  range  profiles,  or  both.  One  of  the  underlaying  assuttq)tions, 
behind  all  of  these  algorithms  is  that  better  classification  performance  will  be  realized  if  sequences  of 
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observations  are  tied  together  in  some  logical  way  and  not  treated  as  individual  events.  The  left-right 
HMM  is  proposed  as  a  possible  way  to  represent  the  temporal  relationships  between  the  observations 
in  these  sequences.  The  classification  of  any  given  HMM  classifier  is  directly  related  to  how  well 
the  parameters  of  the  HMM’s  within  the  classifier  are  estimated  by  the  training  procedure.  Therefore, 
the  training  procedure  is  a  critical  part  of  the  RTI  algorithms.  The  HMM  training  procedure  and  the 
performance  of  the  classifiers  developed  for  this  thesis  are  discussed  below. 

5.2.2. 1  HMM  Training  Clearly,  the  results  presented  in  Section  4.4  show  that  the  HMM 
training  procedure  is  capable  of  estimating  the  model  parameters  to  a  high  degree  of  accuracy.  The 
significant  finding  here  is  that  the  model  parameters  are  estimated  with  no  apriori  knowledge.  The 
classification  performance  of  all  of  the  classification  algorithms  suggest  that  the  training  procedure  is 
also  able  to  accurately  estimate  the  parameters  of  the  HMM’s  used  for  classification. 

5. 2.2. 2  Performance  of  the  Single  Look  Classifier  The  Single  Look  classifier  performs 
well,  but  at  20  dB  SNR  the  KNN  classifier  is  better.  However,  the  Single  Look  classifier  is  more  tolerant 
to  noise.  It  should  be  noted  that  both  the  Single  Look  and  KNN  classification  rate  will  approach  100  % 
if  the  size  of  the  aspect  angle  window  is  decreased.  Because  the  classification  rate  is  well  below  100  % 
for  the  10°  by  10°  aspect  angle  window,  the  size  of  the  aspect  angle  window  represented  by  one  HMM 
is  essentially  limited  to  this  size.  If  more  targets  are  added,  the  size  of  the  window  may  have  to  be 
decreased  further. 

However,  if  the  Single  Look  classifier  waits  to  make  the  classification  decision  after  a  number  of 
range  profiles  are  processed,  the  classification  performance  is  greatly  inqrroved.  For  sequences  of  10 
range  profiles,  the  classification  rate  is  almost  100  %  for  SNR  down  to  5  dB  and  about  95  %  for  a  SNR 
of  0  dB.  At  a  SNR  of  -5  dB,  the  classification  performance  drops  dramatically.  Thus,  this  classifier  is 
good  to  about  0  dB  SNR.  In  addition  to  this  classifiers  high  tolerance  for  noise,  it  is  also  aspect  angle 
track  invariant.  That  is,  it  is  not  sensitive  to  how  the  target  travels  through  the  aspect  angle  window. 

A  major  consideration  in  any  HRR  RTI  system  is  range  profile  alignment  and  most  of  the 
classifiers  in  the  literature  are  extremely  sensitive  to  this  alignment  (17).  However,  the  performance 
of  this  classifier  does  not  decrease  dramatically  when  the  target  is  allowed  to  vary  within  the  range 
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window.  Therefore,  if  the  tracking  system  is  able  to  keep  the  target  centered  within  the  range  window 
within  a  normal  distribution  of  ±2  m,  this  classifier  will  work  without  a  centering  process. 

5.2.2.3  Performance  of  the  Vector  Quantized  Sequential  Range  Profile  Classifier  The 
performance  of  this  classifier  is  good  for  SNR  of  10  dB  and  above,  but  drops  rapidly  below  10  dB. 
However,  even  though  this  classifier  is  trained  on  tracks  1  and  4,  it  is  not  overly  sensitive  to  the  aspect 
angle  track.  The  degraded  performance  below  10  dB  SNR  is  caused  by  a  combination  of  less  than 
adequate  training  and  too  much  error  induced  by  vector  quantizing.  The  solution  to  these  two  problems 
is  to  increase  the  code  book  size  and  the  training  data  set  size.  However,  as  discussed  in  Section  4.6.2, 
if  enough  training  data  is  not  available  for  a  code  book  size  of  128,  the  benefits  of  increasing  the  code 
book  size  to  reduce  the  quantization  error  will  be  degraded  by  reducing  the  training  set  further.  This 
classifier  is  also  not  overly  sensitive  to  range  alignment,  but  the  degraded  performance  below  10  dB 
SNR  limits  its  usefulness. 

5.2.2.4  Performance  of  the  Multiple  State  Sequential  Range  Profile  Classifier  Consid¬ 
ering  that  this  classifier  accounts  for  both  the  temporal  relationships  between  scattering  centers  of  the 
individual  range  profiles  as  well  as  the  temporal  relationships  between  the  range  profile,  this  should 
have  performed  better  the  Single  Look  classifier  which  sees  each  range  profile  as  an  independent  event. 
One  possible  explanation  is  that  by  normalizing  the  range  profiles  some  of  the  information  about  the 
absolute  RCS  and  how  it  changes  with  aspect  angle  is  lost  (4).  In  effect,  the  classification  process  is 
optimized  for  the  Single  Look  classifier  because  the  range  profiles  are  normalized  and  therefore  more 
correlated  between  aspect  angles.  It  should  be  added  that  the  classification  rate  for  this  classifier  may 
iiiqrrove  if  the  size  of  the  training  set  is  increased.  As  with  the  Vector  Quantized  Sequential  Look 
classifier,  this  classifier  is  not  overly  sensitive  to  the  aspect  angle  track  or  range  shifts. 

5.2.2.5  Performance  of  the  Uniform  Multiple  State  Sequential  Range  Profile  Classifier 
Although  the  performance  of  this  classifier  is  slightly  better  when  the  range  profiles  are  centered  than 
the  classifier  discussed  above,  it  is  not  statistically  significantly  better.  However,  this  classifier  is  not  as 
complex  as  the  Multiple  State  Sequential  Range  Profile  classifier  because  the  dependence  on  the  Single 
Look  HMM's  is  eliminated.  Moreover,  when  the  locations  of  the  range  profiles  are  allowed  to  vary  the 
perfonnance  of  this  classifier  is  better  than  the  Multiple  State  Sequential  Range  Profile  classifier  and  is 
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nearly  the  same  as  the  Single  Lock  classifier.  Thus,  the  performance  of  this  classifier  is  comparable  to 
the  Single  Look  classifier  even  though  it  is  trained  with  a  less  than  adequate  training  set. 

5.3  Recommendations 

In  this  thesis  the  problem  is  limited  to  two  classes,  but  the  performance  of  the  Uniform  Multiple 
State  Sequential  Range  Profile  classifier  may  be  better  than  the  Single  Look  classifier  if  the  number 
of  classes  is  increased.  Thus,  first  and  foremost  the  number  of  classes  should  be  increased  so  that  the 
advantages  and  limitations  of  the  different  classifiers  can  be  better  understood. 

Secondly,  the  feature  extraction  processing  time  must  be  addressed  by  either  improving  the 
Prony  Model  parameter  estimation  procedure  or  reducing  the  amount  of  data  which  must  be  processed. 
Short  of  changing  the  Prony  Model  parameter  estimation  procedure,  processing  time  can  be  reduced  by 
decreasing  the  bandwidth.  However,  reducing  the  bandwidth  also  decreases  the  range  resolution.  So, 
there  is  a  clear  trade-off  between  processing  time  and  bandwidth.  Therefore,  the  relationship  between 
bandwidth  and  classification  performance  should  be  investigated. 

In  this  thesis,  three  5  state  left-right  HMM  classifiers  are  implemented,  but  there  are  hundreds 
of  different  HMM  configurations  which  might  be  tried.  Consequently,  continued  research  should  also 
investigate  innovative  ways  of  configuring  the  HMM’s  with  the  goal  of  finding  a  sequential  range 
profile  classifier  with  improved  performance. 

finally,  the  human  visual  system  recognizes  objects  based  on  motion,  color,  and  form.  These 
three  feature  sets  are  processed  in  separate  areas  of  the  brain  before  being  merged  to  form  a  solution 
(32).  Of  course  this  explanation  overly  simplifies  tte  way  the  brain  does  pattern  recognition,  but  the 
point  is  that  within  the  brain,  identification  is  not  based  on  one  feature  set  alone.  In  this  thesis,  an 
attempt  is  made  to  use  features  derived  from  independent  range  profiles  (form)  and  features  derived 
from  sequences  of  range  profiles  (motion)  with  good  results.  Perhaps  the  real  solution  to  the  HRR  RTI 
problem  as  well  as  other  target  recognition  problems  will  be  resolved  when  more  is  understood  about 
the  brain  and  how  it  processes  information. 
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Appendix  A.  DERIVATIONS  OF  IMPORTANT  HMM  VARIABLES 


A.1  Introduction 

In  this  ^pendix  the  derivations  that  apply  to  HMM  will  be  presented.  Specifically,  the  forward 
and  backward  algorithms,  (i),  and  $«.(*,  j)  will  be  dbrivcd. 

A.2  Derivation  of  the  Forward  Algorithm 
The  forward  variable  is  defined  as 

=  =S.|A)  (A.1) 

or  the  joint  probability  of  the  partial  observation  sequence  and  being  in  state  S,-  =  gj, 

at  time  t„,  given  the  model  A  (22)  (21).  (i)  is  evaluated  inductively  as  follows: 

1.  Initialization: 

a,,(t)  =  P(0,.,9«.  =S.|A),  l<i<N 

=  =S.,A)P(g„  =S.|A),  \<i<N  (A.2) 

=  biiOtjTTi,  I  <i<N 

2.  Given  o*,  (i)  for  1  <  i  <  iV  calculate  (j)  by  induction: 

-  l<j<N 

P(0,,0,,--0,.,9.,^,  =S,|A),  l<j<Ar  U<t„<tT-i 

=  l<j<N  U<t„<tT-x 

The  last  term  on  the  right  side  can  be  expressed  in  terms  of  the  known  quantities  Oij  and  (t). 

N 

P  (^«i  >  9<i.+i  ~  1^)  ~  Xrf  ^  '  9u+i  ~  ~  I  A) 

i=l 

=  5^  P  (?<,+>  =  ' '  ■  ^<,1 9<«  = 

•=i 
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It  follows  from  direct  substitution  that 


P  •  ■  •  Ot„,qt,  —  5,1  A) 

N 

•  =1 


(j)  =  bj  ^  aijat,{i),  l<i<N  (A.3) 

fl  <  fn  <  fr-l 

3.  The  final  step  is  the  termination  step  in  which  the  desired  probability,  P(0|A),  is  computed. 

N 

P(Ot,Ot,  -Otr\X)  =  '£PiOt,Ot,  -Otr,gtr=Si\\) 

i=l 

P(0|A)  =  (A.4) 

«=i 

A.3  Derivation  of  the  Backward  Algorithm 

The  derivation  of  the  backward  algorithm  is  similar  to  the  forward  algorithm.  The  backward 
variable  (i)  is  defined  as 

=  =5.,A).  (A.5) 

Hence,  y3».(i)  is  the  probability  of  observing  the  partial  observation  sequence  •  •  -Oi,., 

given  state  5,-  and  time  t„  and  the  model  A  (21)  (22).  Again  the  derivation  proceeds  inductively  as 
follows: 

1.  Initialization:  ^(^(i)  is  initialized  to  1  for  all  i  to  maintain  the  desired  probability.  O  is  not 
explicitly  defined  for  t„  >  tr,  so  the  following  probability  is  arbitrarily  set  to  1  (22). 

=  P(0,^-0,^Jg,^  =  Si,X)  =  1,  l<i<iV  (A.6) 

2.  Induction:  Compute  /9t.  inductively  using  (*)• 

=  P(Ou„-o,rigt,  =  Si,A) 
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s 

=  (0«.+.  •  •  ■  =  Sj\qt^  =  Si,  A) 

j=i 

s 

+  IlPiqt...=SAqi.=Si,X)P{0,^,,-Oi,\qi,^,=Sj,qi,=Si,X) 
i=i 
N 

~  “O-^  (^<«+>  l^t«+j  ■  ■  ■  ^«T>  9u+i  =  9<.  =  '^•1 

i=i 

^  '  ■  ■  ^trkt.+i  =  =  ^fi  A) 

~  (^^<,+1 )  -P  '  •  •  0<j. =  Sj,  qt,  =  Si,  A)  . 

By  the  Markov  property,  0<,^,  •  •  •  O,^  is  independent  of  qt,  =  5,.  Therefore, 


A.(i)  =  =5„A) 


Which  is  the  desired  relationship 


(0«,+j)  A,+i  (j)i 

1  <  t  <  JV. 


(A.7) 


3.  Termination:  The  desired  probability  is  conq)uted  as 


N 

P(0,....0.JA)  =  =5.|A) 

t=i 

w 

=  E  ^  (9‘.  =  ‘5*1'^)  ^  (O.,  •  •  •  O.r  l9«.  =  5.,  A) 

<=i 

N 

—  E  ^'P  ■  * '  ^<T>  9«i  =  5i)  A) 

1=1 

P(0„--0,,|g,.=5.,A) 

n 

=  E’'**«(0‘.)^«.(*)  (A.8) 

1=1 


A.4  Derivation  of  7t,  (t) 


P  (9u  —  •  •  •  Oij.,  A) 


A-3 


P  (O,.  •  •  •  •  •  O,,  |g..  =  5,,  A)  P  («7.„  =  5,|A) 

HU  -  Sj,\)P{g,.  =  5,1A) 

p  (Qe,  ■  Oi,  k.,  =  Si,  X)  P  (q,,  =  5,1  A)  P  (Q,.,,  •  •  •  |g..  =  5,,  A) 

p  (OK  =  5,-,  A)  P (g.,  =  5,|A) 

The  last  factor  in  the  numerator  on  the  right  side  is,  by  definition,  Pt^  (t).  The  remaining  factors  in  the 
numerator  are  equal  to  at,  (t)  as  is  shown  below. 


P(Ott-Ot.K  =  5.,A)P(gt, 


PjOtjOh  —  ^t|A) 

P(9u  =  Si|A) 


=  ««»(*) 

And  similarly,  the  denominator  is  equal  to  P(0|A),  such  that 

N 


gP(0|,..=5,,A)P(,..=5,|A)  = 


=  J^OCtrU) 

i=i 

>=i 

=  P(0|A) 


Now  7i,  (i)  is  expressed  in  terms  of  at,  (»)  and  /?<,  (*)  so  that 

-  P(0|A) 


A.5  Derivation  of  ^t,  (*,  j) 


.  ri  ,1  -  =  =gi|A) 

P{0\X) 


(A.9) 
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_  ^  (Oil  •  •  —  5j|A  )  P  {Ot„^.^  •  •  -OtT'gt.  —  gt.-H  —  ■^>1'^  ) 

P(0|A) 

In  first  term  the  numerator  the  joint  probability  of  Ot^  O*.  and  qt^  —  5,  is  independent  of 
9«»+i  ~  ^j' 

P  (Ou  ■  •  •  =  5,,  9,.,,  =  5,  |A  )  =  P  (O,.  .  •  •  =  5,|A  )  P  (g,.,,  =  5,|A  ) 

=  =5^|A) 

The  second  term  in  the  numerator  can  be  rearranged  so  that 

P  ^tT'9tn  —  ^iiqtn-H  ~^i|A)  =  P(Qt^^^  l^<n  +  l  ’  9«»  5  A  ) 

P{Ot^^„qt,  =  =  SjlA) 

=  ^(^<n+j ‘“^‘T-k^n+i  =  A) -P  =5; [A) 

=  A.+,  (i)P  (O,.*, ,  9..  =  5,,  qt,^,  =  5, 1 A  ) 


The  second  equality  holds  because  Ot,^,  •  •  •  Otj.  is  independent  of  0|,+,  and  gt,  =  5,-.  It  follows  that 


(tniid) 


=gug...,  =5,|A)P(g..,,  = 
P(0  |A) 

Qt.(0/^t.,.f.0')P(O«.-n^gt.  =Si,qt,^^  =Sj\X) 

PiO  lA) 

ttt,  (Qu-f.) 

P(0  lA) 
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Appendix  B.  XPATCH  DATA  FORMAT 

B.l  Introduction 

In  this  i^jpendix  Xpatch  data  format  is  outlined.  Xpatch  is  written  in  FORTAN,  therefore  the 
data  format  is  described  in  those  terms.  The  information  contained  in  this  appendix  was  supplied  by 
Wright-Labs/AARA. 

B.l  Output  File  Format 

The  ou^ut  file  is  an  unformatted  direct  access  file  with  a  record  length  of  32  bytes.  The  first 
several  records  contain  the  simulation  configuration  parameters  and  the  subsequent  records  contain  the 
complex  polarimetric  frequency  samples.  The  number  of  the  header  records  depend  on  the  number  of 
nonuniform  frequency  samples  as  described  in  the  following  paragraph.  The  sample  FORTRAN  open 
statement  shown  for  a  SUN  workstation  has  a  record  length  given  by  the  number  of  bytes.  For  other 
platforms,  such  as  SGI  and  VAX,  the  record  length  is  specified  in  the  number  of  4  byte  words  and  has 
a  value  of  8. 

OPEN{  UNIT  =  lundb, 

:  FILE  =  filnam, 

ACCESS  =  'direct', 

:  FORM  =  'unformatted', 

:  STATUS  =  'old', 

:  RECL  =  32) 

The  variables  contained  in  the  header  records  can  be  read  as  shown  with  the  variable  declarations 
and  descriptions  given  by  Table  Bl.  The  bottom  of  the  code  shows  how  to  read  the  nonunifoim 
frequency  samples  and  assumes  that  an  array  freqSample  has  been  declared  with  maxFreqSamples 
elements. 

READdundb,  REC  =  1)  simTitle(l:32) 

READIlundb,  REC  =  2)  simTitle (33 : 64) 
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READdundb,  REC  =  3)  modelTitle (1 : 32 ) 

READdundb,  REC  =  4)  modelTitle (33 : 64) 

READdundb,  REC  =  5)  (simDated),  i=l,3),  (modelData{i) ,  i=l,3), 


snglOpt,  edgeOpt 

READdundb,  REC  =  6)  multOpt,  rayDensity, 
zoneCellSize,  subdivFac, 
numMaxBounce,  angStartTgtAz, 
angStartTgtEl,  numSigProfileAz 
READdundb,  REC  =  7)  numSigProfileEl,  angSpacingAz, 
angSpacingEl,  freqSpacingType, 
freqStart,  numFreqSamples, 
freqSpacingInc,  numHeadersLeft 

IF  (freqSpacingType  .EQ.  2)  THEN 

IF  (numFreqSamples  .GT.  maxFreqSamples)  THEN 

WRITE (0, *) 'Number  of  frequency  samples  greater  than  ', 

:  'array  dimension' 

WRITE ( 0, *) 'Number  of  frequency  samples  =  ', numFreqSamples 
WRITE ( 0 ,*)' Array  freqSample  dimension  =  ', maxFreqSamples 
STOP 
ENDIF 

freqSample (1 )  =  freqStart 
IF  (numFreqSamples  .GT.  1)  THEN 

numrec  =  (numFreqSamples  -  2)  /  samplesPerRecord  +  1 

DO  10  irec=l, numrec 

istart  =  (irec  -  1)  *  samplesPerRecord  +  2 

lend  =  min(istart+samplesPerRecord-l,  numFreqSamples) 

READ(lundb,REC=nxtrec)  (freqSample (i ) ,  i=istart, lend) 
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nxtrec  =  nxtrec  +  1 


10  CONTINUE 

ENDIF 

ENDIF 


Table  B.l.  Xpatch  Output  File  Header  Variables 


Variable  Name 

Variable  Type 

Variable  Description 

simTitle 

character*64 

title  of  simulation  mn 

modelTitle 

character*32 

title  of  model  file 

simDate 

5  ♦  integer*4 

date  of  simulation  run 

modelDate 

5  *  integer*4 

creation  date  of  model  file 

snglOpt 

integer*4 

single  bounce  option 

edgeOpt 

integer*4 

edge  diffraction  option 

multOpt 

integer*4 

multiple  bounce  option 

rayDensity 

real*4 

incident  ray  density 

zoneCellSize 

real*4 

zone  cell  size  for  ray  processing 

subdivFac 

realM 

ray  tracer  subdivision  factor 

numMaxBounce 

integer*4 

maximum  number  of  bounces  allowed 

angStartTgtAz 

real*4 

starting  target  azimuth  angle 

angStartTgtEl 

real*4 

starting  target  elevation  angle 

numSigProfileAz 

integer*4 

number  of  azimuth  signature  profiles 

numSigProfileEl 

integer*4 

number  of  elevation  signature  profiles 

angSpacingAz 

real*4 

azimuth  angular  spacing  increment 

angSpacingEl 

teal*4 

elevation  angular  spacing  increment 

fieqSpacingType 

integer*4 

sampling  type  (O=uniform,l=nonuniform) 

fieqStart 

real*4 

illumination  starting  frequency 

freqSpacingInc 

real*4 

um'form  frequency  spacing 

numHeadersLeft 

integer*4 

number  of  headers  remaining  in  file 

The  frequency  data  is  written  to  the  output  file  with  the  contents  of  each  record  corresponding  to 
four  complex  double  precision  numbers,  using  complexes  representation.  The  first  complex  number 
in  each  record  represents  the  VV  polarization,  followed  by  VH,  HV,  and  HH  polarizations.  The  data 
records  increment  according  to  increasing  frequency.  A  set  of  numFieqSamples  records  corresponds 
to  a  single  signature  profile.  The  first  signature  profile  corresponds  to  an  orientation  described  by  the 
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target  azimuth  and  elevation  start  angles,  angStartTgtAz  and  angStartTgtEl,  respectively.  An  azimuth 
angle  of  zero  represents  illumination  from  nose  on  and  increasing  azimuth  is  clockwise  when  viewed 
from  above  the  target.  The  target  elevation  is  defined  as  zero  in  the  plane  of  the  target  and  a  positive 
value  indicates  a  view  angle  from  above  the  target  (ARTI  convention). 

Multiple  signature  profiles  within  an  output  file  are  controlled  by  the  angular  spacing  in  the 
azimuth  and  elevation  directions,  angSpacingAZ  and  angSpacingEl,  and  by  the  number  of  signature 
profiles  in  the  azimuth  and  elevation  directions,  numSigProfileAz  and  numSigProfileEl.  The  order  of 
signature  profiles  increment  in  the  azimuth  direction  followed  by  the  elevation  direction.  An  example 
using  FORTRAN  to  read  the  frequency  data  is  shown. 

COMPLEX*8  w,  vh,  hv,  hh 
irec  =  7 

DO  i  =  1,  numSigProfileEl 
DO  j  =  1,  numSigProfileAz 
DO  k  =  1,  numFreqSamples 
irec  =  irec  +  1 

READdundb,  REC  =  irec)  w,  vh,  hv,  hh 
END  DO 
END  DO 
END  DO 

B.3  Utilities 

The  Utilities  directory  under  xpatch  contains  routines  to  examine  the  xpatch  output  file  and 
perform  an  FFT  of  frequency  samples  in  order  to  create  a  range  profile.  The  routine  dumpSP  has  the 
following  syntax: 


dumpSP  infile  [azAngle]  [elAngle]  [numSig]  [outfile] 


where 
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infile  -  name  of  xpatch  output  file 

azAngle  -  azimuth  angle  of  starting  profile  [default  =0.0] 
elAngle  -  elevation  angle  of  starting  profile  [default  =0.0] 
numSig  -  number  of  consecutive  signature  profiles  dumped  [default  =  1] 
outfile  -  formatted  output  file  [default  =  screen] . 
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Appendix  C.  SOURCE  CODE 


C.  1  Introduction 

This  appendix  contains  the  source  code  for  the  subroutines  that  are  used  for  the  Forward- 
Backward  Algorithm,  the  Viterbi  Algorithm  with  logrithnts,  and  the  Baum- Welch  reestimation  algo¬ 
rithm. 


C.2  Forward-Backward  Algorithm 

The  source  code  for  the  Forward-Backward  algorithm  follows  the  notation  as  presented  in  Section 
2.4.12.1. 

|^^^,r|^:^^^^l(^^L^^*^t^t;t‘***^¥*********************i^***********^^***** 

*  Subroutine:  Forward/Backward  Algorithm 

*  Date:  15  July  1992 

*  Author:  Capt  Mark  DeWitt 

#include  <stdio.h> 

#include  <malh.h> 

hthese  functions  can  be  found  in  Numerical  Recipies  for  C4 

float  ***tensoi<); 

float  **matrix(); 

float  ♦vectoK); 

int  **imatrix(); 

void  fwdbckfalpha,  beta,  C,  P,  a,  b,  pi,0,  M,  N,  K,  T) 
float  -^o^alpha,  ***beta,  ♦?,  ♦♦a,  ♦♦b.  **Q,  *pi; 
int  **0,  M,  N,  K,  T; 

/*  input/ouput  variable  discription 

inputs: 

N:  number  of  states 

M:  number  of  observation  symbols 

T:  number  of  observations  per  observation  sequence 

K:  number  of  observation  sequences 

O:  observadons  (KxT  int  matrix  with  first  index;  n 

pi;  initial  state  probabilities  (Nxl  float  vector  first  index:  I ) 

a:  state  transition  probabilities  (NxN  float  matrix  first  index:  1) 

b:  observation  probabilities  (NxM  float  matrix  first  index:  I) 

ouqruts: 

alf^:  forward  variable  coefficients  (KxNxT  float  tensor  first  index:  1 ) 
beta:  backward  variable  coefficients  (KxNxT  float  tensor  first  index:  1) 
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C:  scaling  cefficients  (KxT  float  matrix  first  index:  / ) 

P:  observation  probabilities  (Kxl  float  vector  first  index:  I)^/ 

{ 

int  i,  j,  k,  m,  t; 
float  prob; 

/* Calculate  the  Scaled  Alpha’s  and  Beta’s  for  all  K  observations  using  the  Foward-Backward 
Algorithms^ 

for(k=l;  k<K;  k-H-){ 

/* Calculate  the  alpha’s  for  the  k'th  observation  sequenced 
for(t=l;t<T;t-H-){ 

C[k][t]=0.0; 

for(i=l;  i<N;  i++){/*initialize^ 

C[k][t]  +=  alpha[k]{i][t]  =  pi[i]*b[i][0(k][t]];}  AendiV 
}  /*end  iF/ 
else{ 

for(j=l ;  j<N;  j++){/+recursionV 
for(i=l,  prob  =  0.0;  i<N;  i++){ 

prob  +=a[i]!j]*alpha[k](i][t-l];}  hendiM 
C[k](t]  +=alpha[k]G][t]  =prob*b[j][0[k][t]];}  /♦endjV 
}  I*endelse4 

I*  Scaling  Factor  V 

Ctk][t]  =  l/qk][t]; 

hScale  alpha’sil 
for(isl;  i  <  N; 

alpha[k][i]tt]  =  C[k][t]*alpha[k][i][t];}  /*  end/V 
}  hend  tV 

/*  Calculate  the  probability  of  the  k"th  observation  sequenced 
for(t=l,P[k]=0.0;t<T;t-H-){ 

P[k]  +=  -  (float)loglO((double)C[k][t]);} 

hCalculate  the  beta 's  for  the  k^th  observation  sequenced 
for(t=T;t>l;t — ){ 
if(t  =  T){ 

for(i  =  1;  i  <  N;  i++){  ^initialize  V 
bete[kl[i][t]  =  1.0*C[k][t];} 

}  hend  if 
clse{ 

for(i  =  I ;  i  <  N;  i++){ 

for(j  =  1,  prob  =  0.0;  j  <  N;  j++){  hrecursion^f 

prob  +=  betaEklOKt+ll^aniljlobljllOlkKt-t-l]];}  /♦endjV 
beta[k][i][t]  =  prob  *C[k][t] ;}  hend  iV 
}  h  end  e  lse  M 
}  h  end  tV 
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}  I*  end  k  and  end  fwd-bck  calculations  4 
}  /*end  fwdbck  4 


C.3  Viterbi  Algorithm  with  Logrithms 

The  source  code  for  the  Viterbi  algorithm  follows  the  notation  presented  in  Section  2.4.12.2. 

/ifit******************************************************* 

*  Subroutine:  Viterbi  Algorithm 

*  Date:  15  July  1992 

*  Author:  Capt  Mark  DeWitt 

*♦****♦***♦*******♦**♦♦*♦**♦♦****♦*♦♦♦**♦♦♦**♦♦♦* ********V 

#include  <stdio.h> 

^include  <math.h> 

/*  these  functions  can  be  found  in  Numerical  Recipies  for  C4 

float  ***tensor(); 

float  *4>inatrix(); 

float  *vector(); 

int  ’»>«>imatrix(); 

int  *ivector(); 

void  free.vectorO; 

void  viterbi  Jog(Q,  Pvit,  a,  b,  pi,  O,  M.  N,  K,  T) 
float  **a,  **b,  *pi,  *Pvit; 
int  **0,  M,  N,  K,  T,  **Q; 

!*  input/ouput  variable  discription 

inputs: 

N:  number  of  states 

M:  number  of  observation  symbols 

T:  number  of  observations  per  observation  sequence 

K:  number  of  observation  sequences 

O:  observations  (KxT  int  matrix  with  first  index:  I) 

pi:  initial  state  probabilities  (Nxl  float  vector  first  index:  1 ) 

a:  state  transition  probabilities  (NxN  float  matrix  first  index:  1) 

b:  observation  probabilities  (NxM  float  matrix  first  index:  1) 

outputs: 

Q:  most  probable  state  sequence  (KxT  int  matrix  with  first  index:  1 ) 

Pvit:  probability  of  the  best  state  sequence  (Kxl  float  vector  first  index:  1)4 

\*NOTE:  indxx(x  ,y,  z)  is  a  sorting  routine  (see  Numerical  Recipies  for  C),  where  x  is  the  number  of 
elements  in  the  vector  y  to  be  sorted,  z  (ouput  of  indxx)  is  an  int  vector  containing  the  indicies  of  the 
elements  in  y  in  assending  order.<»/ 

inti.j.k,  m.t; 
float  **delta,  oten^); 
int  **psi,  *indx; 
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I*  allocate  memory  for  the  local  variables  V 

delta=tnatrix(  1  ,N,  1  ,T); 

psi=iinatrix(  1  ,N,  1  ,T); 

tenip=vector(l,N); 

indx=ivector(  1  ,N); 

/*  set  all  zero  probabilities  to  le-20  V 
for(i  =  1;  i  <  N;  i++){ 
if(pi[j]  =  0)  pi[i]  =  le— 20; 
for(j  =  1;  j  <  N;  j++)  if(ati]|j]  =0.0)  a[i][j]  =  le-20; 

} 


for(k  =  1.  Pvit[k]  =  0.0;  k  <  K;  k-H-){ 
for(t=l;t<T;t++){ 
for(i  =  1;  i  <  N;  i++){ 

delta[i][t]  =  (float)(loglO((double)pi[i])  +  logl0((double)b[i][O[k][t]l)); 
psi[i][t]  =  0;}  /*  end  t  V 
}  /*end  if  V 
else{ 

for(j  =  l;j  <  N;j++){ 
for(i  =  1;  i  <  N;  1++) 

temp[i]  =  delta[i][t-l]  +(float)loglO((double)a[i]|j]); 
indexx(N,  temp,  indx); 
psi[j][t]  =  indx[N]; 

delta[j][t]  =  teinp[indx[N]]  +  (float)logl0((double)b[j][O[k][t]]); 

}  /*  end  j  V 
}  /*end  else  V 
}  /*  end  1 4 


for(:  =  T;t>  l;t — ){ 
for(i  =  1;  i  <  N;  i++)  teinp[i]  =  delta[i][t]; 
jndexx(N,  temp,  indx); 
if(t  =  T){Q[k][t]=indx[N];} 
else  {Q[k][t]  =  psilQ[k][t+l]][t+l];} 

}  hend  1 4 

for(i  =  1;  i  <  N;  i-H-)  temp[i]  =  delta[i][T]; 
indexx(N,  ten^,  indx); 

Pvit[k]  =  deita[indx[N]][T]; 

}  /*  end  k  4 

fiee.vector(temp,  1,N); 
fieeJvector(indx,  1,N); 
fieejnatrix(delta,  1,N,1,T); 
free  Jmatrix(psi,  I  ,N,  1  ,T); 

} 
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C.4  Buani'Welch  Reestimation  Algorithm 


The  source  code  for  the  Baum- Welch  reestimation  algorithm  follows  the  notation  presented  in 
Section  3.5.1. 

/***im*****:t,**********:tc*****it******ti*^i**************^***** 

*  Subroutine:  Buam-Welcb  Reestitnation  Algorithm 

*  Date:  15  July  1992 

*  Author:  Capt  Mark  DeWitt 

#include  <stdio.h> 

#include  <math.h> 

hthese  functions  can  be  found  in  Numerical  Recipies  for  C4 

float  ***tensor(); 

float  **matrix(); 

float  ♦vector(); 

int  **imatrix(); 

void  fitee-matrixO; 

void  baum(alpha,  beta,  C,  a,  b,  abar,  bbar,  pi,  pibar,0,  M,  N,  K,  T) 

float  *««alpha,  ’i"t>*beta,  **a,  «4>b,  **C,  *pi,**abar,  **bbar,  *pibar; 
int  **0,  M,  N,  K,  T; 

!*  input/ouput  variable  discnpdon 

inputs: 

N:  number  of  states 

M:  number  of  observation  symbols 

T:  number  of  observations  per  observation  sequence 

K:  number  of  observation  sequences 

O:  observations  (KxT  int  matrix  with  brst  index:  1) 

pi:  initial  state  probabilities  (Nxl  float  vector  first  index:  I ) 

a:  state  transition  probabilities  (NxN  float  matrix  brst  index:  1) 

b:  observation  probabilities  (NxM  float  matrix  Srst  index:  1) 

alpha:  forward  variable  coefbcients  (KxNxT  float  tensor  first  index:  1 ) 

beta:  backward  variable  coefficients  (KxNxT  float  tensor  Srst  index:  1 ) 

C:  scaling  cefficients  (KxT  boat  matrix  Srst  index:  1) 

outputs: 

pi:  updated  initial  state  probabilities  (Nxl  boat  vector  Srst  index:  1) 
a:  ubdated  state  transition  probabilities  (NxN  boat  matrix  Srst  index:  1) 
b:  ubdated  observation  probabilities  (NxM  boat  matrix  Srst  index:  1)4 

{ 

int  i,  j,  k,  m,  t,  tau; 
double  numk,  num,  denk,  den; 
float  prob; 

/*  update  estimate  of  pi’s  4 
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for(i  =  1;  i  <  N;  i++){ 

for(k  =  I,  numk  =  0.0,  denk  =  le-200;  k  <  K;  k++){ 
numk  -»■=  (alpha[k][i][l]’4‘beta[k][i][l]); 
for(j  =  1,  den  =  0.0;  j  <  N;  j++) 
den  +=  (alpha[kl[j][l]*beta[kl[il[l]); 
denk  +=  den; 

} 

pibar[i]  =  (floatXnumk/denk); 

} 

I*  update  estimate  of  a’s,  Eguadon  C3.6>V 

for(i  =  1;  i<N;  i++){ 
for(j  =  l;j  <  N;j++){ 

for(k  =  1,  numk  =  0.0,  denk  =  le— 200;  k  <  K;  k-H-){ 
for(t  =  1,  num  =  le— 200,  den  =  le— 200;  t  <  T— 1; 

num+=alpha[k][i][t]*a[i]Ij]*b[j]lO[k]lt+l]l*betalk]|jl[t+l]/qk][tl; 
den  +=  alphatk][i](t]*beta[k][i][t]/(C[k][t]*C[kl[t]); 

}  /*end  tV 
numk  -1-=:  num; 
denk  +=  den; 

}  /♦  eadkV 

abar[i](j]  s  (float)(numk/dsnk); 

}  /*  end  j  4 
}  /*  eadi4 

I*  update  estimate  ofb’s.  Equation  (3.7)  4 

for(j  =  l;j  <  N;j-H-){ 
for(m  =  1 ;  m  <  M;  m++){ 

for(k  =  1,  numk  =  0.0,  denk  =  le— 200;  k  <  K;  k++){ 
for(t  =  1,  num  =  le— 200,  den  =  le— 200;  t  <  T;  t++){ 
if(m  =  0[k]tt])  {num  +=  alpha[k][j][t]*beta[kl[j][t3/{C[k][t]);} 
den  +=  alpha[kl[jl[tl*beta[k][jl[t]/(C[kl[tl); 

}  /*  endt  4 
numk  ->■=  num; 
denk  4-=  den; 

}  /♦  end  k  4 

bbair|j][m]  =  (floatX  numk/ denk); 
if(bbar|j][m]  <  le— 10)bbaiij][m]  =  le— 10; 

}  h  end  m  4 
}  /*  end  j  4 
} 
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